PowerPoint Template - Hamlet Batista Group LLC

Download Report

Transcript PowerPoint Template - Hamlet Batista Group LLC

White Hat Cloaking – Six Practical
Applications
Presented by Hamlet Batista
Why white hat cloaking?
“Good” vs “bad” cloaking is all about your
intention
Always weigh the risks versus the rewards of
cloaking
Ask permission— or just don’t call it cloaking!
Cloaking vs “IP delivery”
Page  2
Crash course in white hat cloaking
Practical scenarios where good cloaking makes sense
1
When to cloak?
2
Practical scenarios and alternatives
3
How do we cloak?
4
How can cloaking be detected?
5
Risks and next steps
Page  3
When is practical to cloak?
 Content accessibility
- Search unfriendly Content Management Systems
- Rich media sites
- Content behind forms
 Membership sites
- Free and paid content
 Site structure improvements
- Alternative to PR sculpting via “no-follow“
 Geolocation/IP delivery
 Multivariate testing
Page  4
Practical scenario #1
Proprietary website management systems that are
not search-engine friendly
Regular users see
Search engine robot sees
 URLs with many dynamic
parameters
 Search engine friendly URLs
 URLs with session IDs
 URLs with a consistent naming
convention
 URLs with canonicalization issues
 Missing titles and meta descriptions
Page  5
 URLs without session IDs
 Automatically generated titles and
meta descriptions
Practical scenario #2
Sites built completely in Flash, Silverlight or any other rich media
technology
Search engine robot sees
 A text representation of all graphical
(images) elements
 A text representation of all motion
(video) elements
 A text transcription of all audio in the
rich media content
Page  6
Practical scenario #3
Membership sites
Search users see
 Snippets of premium content on the
SERPs
 When they land on the site they are
faced with a registration form
Members sees
 The same content search engine
robots see
Page  7
Practical scenario #4
Sites requiring massive site strucuture changes to improve index
penetration
Regular users follow a
link structure designed
for ease of navigation
Page  8
Step 1
Step 2
Step 3
Step 4
Step 5
Step 1
Step 2
Step 3
Step 4
Step 5
Search engine robots
follow a link structure
designed for ease of
crawling and deeper
index penetration of
the most important
content
Practical scenario #5
Sites using geolocation technology
Regular users see
 Content tailored to their geographical
location and/or user’s language
Search engine robot sees
 The same content consistently
Page  9
Practical scenario #6
Split testing organic search landing pages
Each regular user sees
 One of the content experiment
alternatives
Search engine robot sees
 The same content consistently
Page  10
How do we cloak?
Cloaking is performed with a web server script or module
Search robot detection
 By HTTP User agent
 By IP address
 By HTTP cookie test
 By JavaScript/CSS test
 By DNS double check
 By visitor behavior
 By combining all the techniques
Page  11
Content delivery
 Presenting the equivalent of the
inaccesible content to robots
 Presenting the search-engine friendly
content to robots
 Presenting the content behind forms
robots
Robot detection by HTTP user agent
A very simple robot detection technique
Search robot HTTP request
66.249.66.1
[04/Mar/2008:00:20:56 -0500]
“GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/
HTTP/1.1″
200
61477
“-”
“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“-”
Page  12
Robot detection by HTTP cookie test
Another simple robot detection technique, but weaker
Search robot HTTP request
66.249.66.1
[04/Mar/2008:00:20:56 -0500]
“GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/
HTTP/1.1″
200
61477
“-”
“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“Missing cookie info”
Page  13
Robot detection by JavaScript/CSS test
Another option for robot detection
DHTML Content
HTML Code
<div id="header"><h1><a href="http://www.example.com" title="Example Site">Example site</a></h1></div>
and the CSS code is pretty straight forward, it swaps out anything in the h1 tag in the header with an image
CSS Code
/* CSS Image replacement */
#header h1 {margin:0; padding:0;}
#header h1 a {
display: block;
padding: 150px 0 0 0;
background: url(path to image) top right no-repeat;
overflow: hidden;
font-size: 1px;
line-height: 1px;
height: 0px !important;
height /**/:150px;
}
Page  14
Robot detection by IP address
A more robust robot detection technique
Search robot HTTP request
66.249.66.1
[04/Mar/2008:00:20:56 -0500]
“GET /2007/11/13/game-plan-what-marketers-can-learn-from-strategy-games/
HTTP/1.1″
200
61477
“-”
“Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
“-”
Page  15
Robot detection by double DNS check
A more robust robot detection technique
Search robot HTTP request
nslookup
66.249.66.1
Name: crawl-66-249-66-1.googlebot.com
Address: 66.249.66.1
crawl-66-249-66-1.googlebot.com
Non-authoritative answer:
Name: crawl-66-249-66-1.googlebot.com
Address: 66.249.66.1
Page  16
Robot detection by visitor behavior
Robots differ substantially from regular users when visiting a website
Page  17
Combining the best of all techniques
Label a robot anything that
identifies as such
Maintain a cache with a
list of known search
robots to reduce the
number of verification
attempts
Confirm it is a robot by
doing a double DNS check.
Also confirm suspect robots
Label as possible robot
any visitor with
suspicious behavior
Page  18
Clever cloaking detection
A clever detection technique is to check the caches at the newest
datacenters
 IP-based detection techniques rely
on an up-to-date list of robot IPs
 Search engines change IPs on a
regular basis
 It is possible to identify those new
IPs and check the cache
Page  19
Risks of cloaking
Search engines do not want to accept any type of cloaking
Survival tips
 Cloaking: Serving different content to
users than to Googlebot. This is a
violation of our webmaster guidelines.
If the file that Googlebot sees is not
identical to the file that a typical user
sees, then you're in a high-risk
category. A program such as
md5sum or diff can compute a
hash to verify that two different
files are identical.
 http://googlewebmastercentral.blogs
pot.com/2008/06/how-google-definesip-delivery.html
Page  20
 The safest way to cloak is to ask for
permission from each of the search
engines that you care about
 Refer to it as IP delivery.
Next Steps
 Make sure clients understand the risks/rewards of implementing
white hat cloaking
 More information and how to get started
- How Google defines IP delivery, geolocation and cloaking
http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ipdelivery.html
- First Click Free http://googlenewsblog.blogspot.com/2007/09/first-click-free.html
- Good Cloaking, Evil Cloaking and Detection
http://searchengineland.com/070301-065358.php
- YADAC: Yet Another Debate About Cloaking Happens Again
http://searchengineland.com/070304-231603.php
- Cloaking is OK Says Google http://blog.ventureskills.co.uk/2007/07/06/cloaking-is-ok-says-google/
- Advanced Cloaking Technique: How to feed password-protected content to
search engine spiders http://hamletbatista.com/2007/09/03/advanced-cloakingtechnique-how-to-feed-password-protected-content-to-search-engine-spiders/
Page  21
 Blog http://hamletbatista.com
 LinkedIn http://www.linkedin.com/in/hamletbatista
 Facebook
http://www.facebook.com/people/Hamlet_Batista/613808617
 Twitter http://twitter.com/hamletbatista
 E-mail [email protected]
Feel free to
contact me
I would be happy to help.
Page  22