dealing with crawlers
TRANSCRIPT
WELCOME
Search Engine Optimization
X
Dealing with Crawlers
Dealing with the Crawlers the good & the bad robots …………………………..
1. What is robots.txt file? The Definition ………………………………...
2. Structure of robots.txt file for SEO purpose The Syntax ……….
a. Standard User-agent, Disallow ……………………………..….
b. Nonstandard Crawl-delay, Allow, Sitemap ………………….
c. Extended Request-rate, Visit-time, Comment ………..…….
3. Effective use of robots.txt file Best Practices ………………………
4. Be aware of rel=“nofollow” Comment Spammers …...………………
5. User Generated Spam How to avoid …………………....………………
4
5
6
7
9
10
11
12
13
A text file placed in the root directory and it is used to
communicate with the search engines regarding the
sections which you don’t want them to crawl and
index.
• It is not necessary to have a robots.txt file.
• Make sure that you place the robots.txt file in the main directory.
• Restrict crawling where it's not needed.
• Common robot traps “Forms, Logins, Session IDs, Frames”.
It is easy to create a robots.txt file as the structure of
the robots.txt file is simple and basically, it contains
the list of user agents and the files, directories which
are to be excluded from crawling and indexing.
Standard
• User-agent:
• Disallow:
Nonstandard
• Crawl-delay:
• Allow:
• Sitemap:
Extended Standard
• Request-rate:
• Visit-time:
• Comment:
If you would like to set value for all crawlers use:
• User-agent: *
If you would like to set a value to a specific search engine robot:
• User-agent: BotName
A Complete updated list of Bots can be found at:
• http://www.user-agents.org/
To allow all robots to visit all files:
• User-agent: *
• Disallow:
To keep all robots out:
• User-agent: *
• Disallow: /
To tell all robots not to visit
specific directories:
• User-agent: *
• Disallow: /dir_name/
• Disallow: /users/login/
To tell all robots not to visit
specific files:
• User-agent: *
• Disallow: /dir_name/file.html
To tell specific robot not to visit
specific Directory:
• User-agent: BotName
• Disallow: /dir_name/
To tell specific robot not to visit
specific File:
• User-agent: BotName
• Disallow: /dir_name/file.html
Crawl-delay Directive:
• User-agent: *
• Crawl-delay: 10
Allow Directive:
• Allow: /dir_name1/file_name.html
• Disallow: /dir_name1/
Sitemap Directive:
• Sitemap: http://www.domain.com/sitemap.xml
• Sitemap: http://www.domain.com/dir/s/names-sitemap.xml.gz
Request-rate Directive:
• User-agent: *
• Disallow: /admin/
• Request-rate: 1/5
Visit-time Directive:
• User-agent: *
• Disallow: /admin/
• Visit-time: 1300-2030
Comment Directive:
• User-agent: YahooSeeker/1.1
• Comment: because Yahoo sucks :P
• Restrict crawling where it's not needed with robots.txt.
• Use more secure methods for sensitive content.
• Avoid allowing search result-like pages to be crawled.
• Avoid allowing URLs created as a result of proxy
services to be crawled.
• Create separate robots.txt file for each subdomain.
<a href=“http://www.domain.com” rel=“nofollow”>Spam</a>
• If your site has a blog with public commenting turned on, links within
those comments could pass your reputation to pages that you may not
be comfortable vouching for. Blog comment areas on pages are highly
susceptible to comment spam. Nofollowing these user-added links
ensures that you're not giving your page's hard-earned reputation to a
spammy site.
• Use anti-spam tools.
• Turn on comment moderation.
• Use “nofollow” tags.
• Disallow hyperlink in comments.
• Block comments pages using robots.txt or META tags.
• Think twice before enabling guestbook or comments.
• Use blacklist to prevent repetitive spamming attempts.
• Add a “report spam” feature to user profiles and friend invitations.
• Monitor your site for spammy pages.
How to get on Google’s first page
• On Site Factors.
• On Page Factors.
• Off Page Factors.
SEO for Mobile Phones
• Notify Google about mobile sites
• Guide mobile users accurately
Promotions and Analysis
• Promote your website in the right way
• Make use of free webmasters tools
THANK YOUX