Search Engine Marketing
Download
Report
Transcript Search Engine Marketing
Unraveling URLs and
Demystifying Domains
presented by Stephan Spencer,
Founder & President, Netconcepts
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Subdomains vs. Subdirectories
Matt's/Google's announcement – they'd essentially treat
them as the same
(www.mattcutts.com/blog/subdomains-andsubdirectories/)
You shouldn't treat subdomains as a means of creating
tons of easy thin-content microsites. They're being
viewed as subdirectories. Yes, use them for managing
your website and doing load balancing. No, don't use
them purely for SEO reasons.
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Microsites
Can be bad for your SEO if overly numerous or if they
contain substantial amounts of duplicate content
(merely changing the UI doesn’t count)
Can be good when you’ll get more link love
– Hyphothetical example: stayinghealthy.com vs.
stayinghealthy.metlife.com
Can also be beneficial in terms of demographic
targeting and focused keyword targeting
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Keywords in URLs
Beneficial in Google regardless of whether in
filename/directory/subdirectory names versus variable
values in querystrings.
In other search engines, more important that the
keyword be in filename/directory/subdirectory. And, the
closer the keyword(s) are to the root domain name,
apparently the more weight they will lend.
Just because a keyword is bolded in the SERP doesn’t
mean it’s given extra weight in the ranking algo.
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Word Separators in URLs
Hyphens are the best. Preferred over underscores.
– Historically to Google underscores were not word separators
– Bare spaces cannot be used in URLs. Character encoded
equivalents for "white space" character are + or %20. (e.g.
blue%20widgets.htm). Regardless, hyphen is preferred.
Too much of a good thing looks like keyword stuffing
– Aim for fewer than a half dozen words (i.e. <5 hyphens)
– See my Matt Cutts interview
(stephanspencer.com/search-engines/matt-cutts-interview)
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
URL Stability
An annually recurring feature, like a Holiday Gift Buying
Guide, should have a stable URL
– When the current edition is to be retired and replaced with a
new edition, assign a new URL to the archived edition
Otherwise link juice earned over time is not carried over
to future years’ editions
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Domain Age and Expiry
Crusty old domains (and crusty old sites) are more
trusted by Google, as alluded to in Google’s
"Information retrieval based on historical data” patent
– Parked domains aren’t as trusted. Start the clock running.
Number of years that your domain name has before
expiring may very well be a big quality indicator.
– Suggest increasing the registration period for your domain so
the expiration date will be further in the future
– Particularly for newer domains
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Domain Age and Expiry
– Domainers often have been known to do "tasting” (i.e.
registering domains for just a couple of days to see what
keyword traffic they get)
– Google just announced that they'll stop displaying AdSense
ads on domain tasting sites as a measure to try to fight the
practice
(www.informationweek.com/news/showArticle.jhtml?articleID
=205918984)
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Rewriting Your Spider-Unfriendly URLs
3 approaches:
1) Use a “URL rewriting” server module / plugin – such as
mod_rewrite for Apache, or ISAPI_Rewrite for IIS Server
2) Recode your scripts to extract variables out of the “path_info”
part of the URL instead of the “query_string”
3) Or, if IT department involvement must be minimized, use a
proxy server based solution (e.g. Netconcepts' GravityStream)
– With (1) and (2), replace all occurrences of your old URLs in
links on your site with your new search-friendly URLs. 301
redirect the old to new URLs too, so no link juice is lost.
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Let’s Geek Out!
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
URL Rewriting – Under the Hood
If running Apache, place “rules” within .htaccess or your
Apache config file (e.g. httpd.conf, sites_conf/…)
–
–
–
–
RewriteEngine on
RewriteBase /
RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L]
RewriteRule ^([^/]+)/([^/]+)\.htm$
/webapp/wcs/stores/servlet/ProductDisplay?storeId=10001&c
atalogId=10001&langId=-1 &categoryID=$1&productID=$2
[QSA,P,L]
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
URL Rewriting – Under the Hood
The magic of regular expressions / pattern matching
–
–
–
–
–
–
–
–
* means 0 or more of the immediately preceding character
+ means 1 or more of the immediately preceding character
? means 0 or 1 occurrence of the immediately preceding char
^ means the beginning of the string, $ means the end of it
. means any character (i.e. wildcard)
\ “escapes” the character that follows, e.g. \. means dot
[ ] is for character ranges, e.g. [A-Za-z].
^ inside [] brackets means “not”, e.g. [^/]
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
URL Rewriting – Under the Hood
– () puts whatever is wrapped within it into memory
– Access what’s in memory with $1 (what’s in first set of
parens), $2 (what’s in second set of parens), and so on
Regular expression gotchas to beware of:
– “Greedy” expressions. Use [^ instead of .*
– .* can match on nothing. Use .+ instead
– Unintentional substring matches because ^ or $ wasn’t
specified
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
URL Rewriting – Under the Hood
Proxy page using [P] flag
– RewriteRule /blah\.html$ http://www.google.com/ [P]
[QSA] flag is for when you don’t want query string
params dropped (like when you want a tracking param
preserved)
[L] flag saves on server processing
Got a huge pile of rewrites? Use RewriteMap and have
a lookup table as a text file
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
If You’re on Microsoft IIS Server
ISAPI_Rewrite not that different from mod_rewrite
In httpd.ini :
– [ISAPI_Rewrite]
RewriteRule ^/category/([0-9]+)\.htm$
/index.asp?PageAction=VIEWCATS&Category=$1 [L]
– Will rewrite a URL like
http://www.example.com/index.asp?PageAction=VIEWCATS
&Category=207 to something like
http://www.example.com/category/207.htm
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
301 Redirects – Under the Hood
In .htaccess (or httpd.conf), you can redirect individual
URLs, the contents of directories, entire domains… :
– Redirect 301 /old_url.htm
http://www.example.com/new_url.htm
– Redirect 301 /old_dir/ http://www.example.com/new_dir/
– Redirect 301 / http://www.example.com
Pattern matching can be done with RedirectMatch 301
– RedirectMatch 301 ^/(.+)/index\.html$
http://www.example.com/$1/
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
301 Redirects – Under the Hood
Or use a rewrite rule with the [R=301] flag
– RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC]
– RewriteRule ^(.*)$ http://www.example.com/$1
[L,QSA,R=301]
[NC] flag makes the rewrite condition case-insensitive
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Conditional Redirects, Under the Hood
Selectively redirect bots that request URLs with session
IDs to the URL sans session ID:
– RewriteCond %{QUERY_STRING} PHPSESSID
RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} Ask\ Jeeves
RewriteRule ^/(.*)$ /$1 [R=301,L]
Utilize browscap.ini instead of having to keep up with
each spider’s name and version changes
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
URLs that Lead to Error Pages
Traditional approach is to serve up a 404, which drops
that obsolete or wrong URL out of the search indexes.
This squanders the link juice to that page.
But what if you return a 200 status code instead, so that
the spiders follow the links? Then include a meta robots
noindex so the error page itself doesn’t get indexed.
Or do a 301 redirect to something valuable (e.g. your
home page) and dynamically include a small error
notice?
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]
Thanks!
This Powerpoint can be downloaded from
www.netconcepts.com/learn/unraveling-urls.ppt
For 180 minute long screencast (including 90 minutes
of Q&A) on SEO for large dynamic websites (taught
by myself and Chris Smith) – including transcripts –
email [email protected]
Questions after the show? Email me at
[email protected]
© 2008 Stephan M Spencer Netconcepts www.netconcepts.com [email protected]