PowerPoint Template - Hamlet Batista Group LLC
Download
Report
Transcript PowerPoint Template - Hamlet Batista Group LLC
White Hat Cloaking – Six Practical
Applications
Presented by Hamlet Batista
Why white hat cloaking?
“Good” vs “bad” cloaking is all about your
intention
Always weigh the risks versus the rewards of
cloaking
Ask permission— or just don’t call it cloaking!
Cloaking vs “IP delivery”
Page 2
Crash course in white hat cloaking
Practical scenarios where good cloaking makes sense
1
When to cloak?
2
Practical scenarios and alternatives
3
How do we cloak?
4
How can cloaking be detected?
5
Risks and next steps
Page 3
When is practical to cloak?
Content accessibility
- Search unfriendly Content Management Systems
- Rich media sites
- Content behind forms
Membership sites
- Free and paid content
Site structure improvements
- Alternative to PR sculpting via “no-follow“
Geolocation/IP delivery
Multivariate testing
Page 4
Practical scenario #1
Proprietary website management systems that are
not search-engine friendly
Regular users see
Search engine robot sees
URLs with many dynamic
parameters
Search engine friendly URLs
URLs with session IDs
URLs with a consistent naming
convention
URLs with canonicalization issues
Missing titles and meta descriptions
Page 5
URLs without session IDs
Automatically generated titles and
meta descriptions
Practical scenario #2
Sites built completely in Flash, Silverlight or any other rich media
technology
Search engine robot sees
A text representation of all graphical
(images) elements
A text representation of all motion
(video) elements
A text transcription of all audio in the
rich media content
Page 6
Practical scenario #3
Membership sites
Search users see
Snippets of premium content on the
SERPs
When they land on the site they are
faced with a registration form
Members sees
The same content search engine
robots see
Page 7
Practical scenario #4
Sites requiring massive site strucuture changes to improve index
penetration
Regular users follow a
link structure designed
for ease of navigation
Page 8
Step 1
Step 2
Step 3
Step 4
Step 5
Step 1
Step 2
Step 3
Step 4
Step 5
Search engine robots
follow a link structure
designed for ease of
crawling and deeper
index penetration of
the most important
content
Practical scenario #5
Sites using geolocation technology
Regular users see
Content tailored to their geographical
location and/or user’s language
Search engine robot sees
The same content consistently
Page 9
Practical scenario #6
Split testing organic search landing pages
Each regular user sees
One of the content experiment
alternatives
Search engine robot sees
The same content consistently
Page 10
How do we cloak?
Cloaking is performed with a web server script or module
Search robot detection
By HTTP User agent
By IP address
By HTTP cookie test
By JavaScript/CSS test
By DNS double check
By visitor behavior
By combining all the techniques
Page 11
Content delivery
Presenting the equivalent of the
inaccesible content to robots
Presenting the search-engine friendly
content to robots
Presenting the content behind forms
robots
Robot detection by HTTP user agent
A very simple robot detection technique
Search robot HTTP request
66.249.66.1
[04/Mar/2008:00:20:56 -0500]
“GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/
HTTP/1.1″
200
61477
“-”
“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“-”
Page 12
Robot detection by HTTP cookie test
Another simple robot detection technique, but weaker
Search robot HTTP request
66.249.66.1
[04/Mar/2008:00:20:56 -0500]
“GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/
HTTP/1.1″
200
61477
“-”
“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“Missing cookie info”
Page 13
Robot detection by JavaScript/CSS test
Another option for robot detection
DHTML Content
HTML Code
<div id="header"><h1><a href="http://www.example.com" title="Example Site">Example site</a></h1></div>
and the CSS code is pretty straight forward, it swaps out anything in the h1 tag in the header with an image
CSS Code
/* CSS Image replacement */
#header h1 {margin:0; padding:0;}
#header h1 a {
display: block;
padding: 150px 0 0 0;
background: url(path to image) top right no-repeat;
overflow: hidden;
font-size: 1px;
line-height: 1px;
height: 0px !important;
height /**/:150px;
}
Page 14
Robot detection by IP address
A more robust robot detection technique
Search robot HTTP request
66.249.66.1
[04/Mar/2008:00:20:56 -0500]
“GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/
HTTP/1.1″
200
61477
“-”
“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“-”
Page 15
Robot detection by double DNS check
A more robust robot detection technique
Search robot HTTP request
nslookup
66.249.66.1
Name: crawl-66-249-66-1.googlebot.com
Address: 66.249.66.1
crawl-66-249-66-1.googlebot.com
Non-authoritative answer:
Name: crawl-66-249-66-1.googlebot.com
Address: 66.249.66.1
Page 16
Robot detection by visitor behavior
Robots differ substantially from regular users when visiting a website
Page 17
Combining the best of all techniques
Label a robot anything that
identifies as such
Maintain a cache with a
list of known search
robots to reduce the
number of verification
attempts
Confirm it is a robot by
doing a double DNS check.
Also confirm suspect robots
Label as possible robot
any visitor with
suspicious behavior
Page 18
Clever cloaking detection
A clever detection technique is to check the caches at the newest
datacenters
IP-based detection techniques rely
on an up-to-date list of robot IPs
Search engines change IPs on a
regular basis
It is possible to identify those new
IPs and check the cache
Page 19
Risks of cloaking
Search engines do not want to accept any type of cloaking
Survival tips
Cloaking: Serving different content to
users than to Googlebot. This is a
violation of our webmaster guidelines.
If the file that Googlebot sees is not
identical to the file that a typical user
sees, then you're in a high-risk
category. A program such as
md5sum or diff can compute a
hash to verify that two different
files are identical.
http://googlewebmastercentral.blogs
pot.com/2008/06/how-google-definesip-delivery.html
Page 20
The safest way to cloak is to ask for
permission from each of the search
engines that you care about
Refer to it as IP delivery.
Next Steps
Make sure clients understand the risks/rewards of implementing
white hat cloaking
More information and how to get started
- How Google defines IP delivery, geolocation and cloaking
http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ipdelivery.html
- First Click Free http://googlenewsblog.blogspot.com/2007/09/first-click-free.html
- Good Cloaking, Evil Cloaking and Detection
http://searchengineland.com/070301-065358.php
- YADAC: Yet Another Debate About Cloaking Happens Again
http://searchengineland.com/070304-231603.php
- Cloaking is OK Says Google http://blog.ventureskills.co.uk/2007/07/06/cloaking-is-ok-says-google/
- Advanced Cloaking Technique: How to feed password-protected content to
search engine spiders http://hamletbatista.com/2007/09/03/advanced-cloakingtechnique-how-to-feed-password-protected-content-to-search-engine-spiders/
Page 21
Blog http://hamletbatista.com
LinkedIn http://www.linkedin.com/in/hamletbatista
Facebook
http://www.facebook.com/people/Hamlet_Batista/613808617
Twitter http://twitter.com/hamletbatista
E-mail [email protected]
Feel free to
contact me
I would be happy to help.
Page 22