Download - URLs and Domains (SMX East 2008)
Oct 7th, 2008
URLs and DomainsNathan Buggia, Live Search Webmaster Center
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
What’s a URL (and where search engines get stuck)
http://auto.msn.co.uk/autos/default.aspx?id=AA#found
2
Protocol Hostname Path Query Fragment
auto.msn.co.ukSubdomai
nTLD (ccTLD)
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
HTTP Status Codes
• 200 – Everything’s okay (make sure you don’t return this code on a “Page not Found”!)
• 404 – File not found
• 301 – File has been moved
• 302 – File is temporarily somewhere else
3
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Robots Exclusion Protocol – Common Mistakes
microsoft.com/robots.txt does not apply to
technet.microsoft.com
4
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Robots Exclusion Protocol
5
http://janeandrobot.com/post/Managing-Robots-Access-To-Your-Website.aspx
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Parameter Tracking
• http://mysite.com/?from=PROMO_1
6
1. Trap the request
2. Create a cookie with from=PROMO_1
3. Set Cache-Control:no-cache content header
4. Do a 301 redirect to http://mysite.com
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Duplicate Content
• When there is more than one URL for the same content
7
•http://oreilly.com•http://oreilly.com/index.csp•http://www.oreilly.com•http://www.oreilly.com/index.csp•https://oreilly.com•https://oreilly.com/index.csp
Create a few simple rules that will remove duplicate URLs by 301 redirecting all variations to the shortest, most authoritative URL. Often called “Domain
Canonicalization”http://janeandrobot.com/post/canonical-url-canonicalization-domain.aspx
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Duplicate Content – Gateway Pages
• When a log-in page, or region select page is placed on each URL unless you already have a cookie
8
Because search engines don’t support cookies, they may see every URL on your site having the same content
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Sitemaps
9
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Great Tools From Search Engines
• Live Search (webmaster.live.com)• Crawl Issues (404s, Too many parameters, REP, Bad
ContentType)• Rank Info (PageRank, DomainRank)• Backlinks/Outbound links
• Google (google.com/webmaster)• Crawl Issues (404s, REP, Timeout, Unreachable)• Comprehensive Link Explorer (outbound, inbound links)• Set WWW vs. Non-WWW
• Yahoo (SiteExplorer.search.yahoo.com)• Feedback on URL Parameters• Backlinks/ Outbound links10
Live Search Events Confidential (Restricted) - Use pursuant to Company instructions
Issues Encountered by URL
11
webmaster.live.com