Transcript Cloaking

Cloak and Dagger
In a nutshell…
•
Cloaking
•
Cloaking in search engines
•
Search engines’ response to cloaking
•
Lifetime of cloaked search results
•
Cloaked pages in search results
•
Ubiquity of advertising on the Internet.
•
Search, by and large, enjoys the primacy.
•
Search Engine Optimisation – SEO – doctoring of
search results.
•
For benign ends such as simplifying page content,
optimizing load times, etc.
•
For malicious purposes such as manipulating
page ranking algorithms.
Cloaking
•
Conceals the true nature of a Web site
•
Keyword Stuffing – Associating benign content to
keywords
•
Attracting traffic to scam pages
•
Protecting the Web servers from being exposed
•
Not scamming those who arrive at the site via
different keywords.
Types of Cloaking
•
Repeat Cloaking
•
User Agent Cloaking
•
Referrer Cloaking (sometimes also called “Clickthrough Cloaking”)
•
IP Cloaking
DAGGER
Dagger encompasses five different functions –
•
Collection of search terms
•
Querying search results generated search engines
•
Crawling search results
•
Detecting cloaking
•
Repeating the above four processes to study variance
in measurements
Collection of Search Terms
Two different kinds of cloaked search terms are targeted:
•
TYPE 1 : Search terms which contain popular words.
•
Aimed at gathering high volumes of undifferentiated
traffic.
•
TYPE 2: Search terms which reflect highly targeted
traffic
•
Here cloaked content matches the cloaked search
terms.
•
TYPE 1 : Use popular trending search terms
•
Google Hot Searches and terms - shed light on search
engine based data collection methods, respectively
•
Alexa - client-based data collection methods
•
Twitter terms clue us on social networking trends.
•
Cloaked page entirely unrelated to the trending
search terms
•
TYPE 2: set of terms catering to a specific domain
•
Content of the cloaked pages actually matches the
search terms.
Querying Search Results
•
Terms collected in the previous step are fed to the search
engines
•
Study the prevalence of cloaking across engines
•
Examine their response to cloaking.
•
Top 100 search results and accompanying metadata compiled
into list
•
“Known good” domains entries eliminated in order to false
positives during data processing.
•
Similar entries are grouped together with appropriate ‘count’.
Crawling Search Results
•
Crawl the URL’s.
•
Process the fetched pages
•
Detect cloaking in parallel
•
Helps minimize any possible time of day effects.
•
Multiple crawls
•
Normal search user
•
Googlebot Web crawler
•
A user who does not click through the search result
•
Detect pure user-agent cloaking without any checks on
the referrer.
•
35% of cloaked search results for a single measurement
perform pure user-agent cloaking.
•
Pages that employ both user-agent and referrer cloaking
are nearly always malicious.
•
IP Cloaking - half of current cloaked search results do in
fact employ IP cloaking via reverse DNS lookups.
Detecting Cloaking
•
Process the crawled data using multiple iterative passes
•
Various transformations and analyses are applied
•
This helps compile the information needed to detect cloaking.
•
Each pass uses a comparison based approach:
•
Apply same transformations onto the views of the same URL, as seen
from the user and the crawler
•
Directly compare the result of the transformation using a scoring
function
•
Thresholding - detect pages that are actively cloaking and annotate
them.
•
Used for later analysis.
Temporal Re-measurement
•
To study lifetime of cloaked pages.
•
Temporal component in Dagger.
•
Fetch search results from search engines
•
Crawl and process URLs at later instances of time.
•
Measure the rate at which search engines respond to cloaking
•
Measure the duration pages are cloaked
Cloaking Over Time
•
In trending searches the terms constantly change.
•
Cloakers target many more search terms and a broad demographic
of potential victims
•
Pharmaceutical search terms are static
•
Represent product searches in a very specific domain.
•
Cloakers have much more time to perform SEO to raise the rank of
their cloaked pages.
•
This results in more cloaked pages in the top results.
•
Blackhat SEO – artificially boost the rankings of
cloaked pages.
•
Search detect cloaking either directly (analyzing
pages) or indirectly (updating the ranking algorithm).
Sources of Search Terms
•
Augmenting popular search terms with suggestions.
•
Enables targeting the same semantic topic as popular
search terms.
•
Cloaking in search results highly influenced by the
search terms.
Search Engine Response
•
Search engines try to identify and thwart cloaking.
•
Cloaked pages do regularly appear in search results,.
•
Many are removed or suppressed by the search
engines within hours to a day.
•
Cloaked search results rapidly begin to fall out of the
top 100 within the first day, with a more gradual drop
thereafter.
Cloaking Duration
•
Cloakers manage their pages similarly independent of
the search engine.
•
Pages are cloaked for long durations: over 80%
remain cloaked past seven days.
•
Cloakers will want to maximize the time that they
might benefits of cloaking by attracting customers to
scam sites, or victims to malware sites.
•
Difficult to recycle a cloaked page to reuse at a later
time.
Cloaked Content
•
Redirection of users through chain of advertising
networks
•
About half of the time a cloaked search result leads to
some form of abuse.
•
long-term SEO campaigns constantly change the
search terms they are targeting and the hosts they
are using.
Domain Infrastructure
•
Key resource to effectively deploy cloaking in scam:
•
Access to Web sites
•
Access to domains
•
For TYPE I terms, majority of cloaked search results
are in .com.
•
For TYPE II terms, cloakers use the “reputation” of
pages to boost their ranking in search results
Search Engine Optimization
•
Since a major motivation for cloaking is to attract user
traffic, we can extrapolate SEO performance based on the
search result positions the cloaked pages occupy.
•
Cloaking the TYPE I terms target popular terms that are
very dynamic, with limited time and heavy competition for
performing SEO on those search terms.
•
Cloaking TYPE II terms is a highly focused task on a static
set of terms,
•
Provides much longer time frames for performing SEO on
cloaked pages for those terms.
Conclusion
•
Cloaking has become a standard tool in the
scammer’s toolbox
•
Cloaking adds significant complexity for
differentiating legitimate Web content from
fraudulent pages.
•
Majority of cloaked seaarch results remain high in
rankings for 12 hours
•
The pages themselves can persist far longer.
•
Search engine providers will need to further reduce
the lifetime of cloaked results to demonetize the
underlying scam activity.