dealing with crawlers

WELCOME

Search Engine Optimization

X

Dealing with Crawlers

Dealing with the Crawlers the good & the bad robots …………………………..

1. What is robots.txt file? The Definition ………………………………...

2. Structure of robots.txt file for SEO purpose The Syntax ……….

a. Standard User-agent, Disallow ……………………………..….

b. Nonstandard Crawl-delay, Allow, Sitemap ………………….

c. Extended Request-rate, Visit-time, Comment ………..…….

3. Effective use of robots.txt file Best Practices ………………………

4. Be aware of rel=“nofollow” Comment Spammers …...………………

5. User Generated Spam How to avoid …………………....………………

4

5

6

7

9

10

11

12

13

A text file placed in the root directory and it is used to

communicate with the search engines regarding the

sections which you don’t want them to crawl and

index.

• It is not necessary to have a robots.txt file.

• Make sure that you place the robots.txt file in the main directory.

• Restrict crawling where it's not needed.

• Common robot traps “Forms, Logins, Session IDs, Frames”.

It is easy to create a robots.txt file as the structure of

the robots.txt file is simple and basically, it contains

the list of user agents and the files, directories which

are to be excluded from crawling and indexing.

Standard

• User-agent:

• Disallow:

Nonstandard

• Crawl-delay:

• Allow:

• Sitemap:

Extended Standard

• Request-rate:

• Visit-time:

• Comment:

If you would like to set value for all crawlers use:

• User-agent: *

If you would like to set a value to a specific search engine robot:

• User-agent: BotName

A Complete updated list of Bots can be found at:

• http://www.user-agents.org/

http://www.user-agents.org/



To allow all robots to visit all files:

• User-agent: *

• Disallow:

To keep all robots out:

• User-agent: *

• Disallow: /

To tell all robots not to visit

specific directories:

• User-agent: *

• Disallow: /dir_name/

• Disallow: /users/login/

To tell all robots not to visit

specific files:

• User-agent: *

• Disallow: /dir_name/file.html

To tell specific robot not to visit

specific Directory:


• Disallow: /dir_name/

To tell specific robot not to visit

specific File:


• Disallow: /dir_name/file.html

Crawl-delay Directive:

• User-agent: *

• Crawl-delay: 10

Allow Directive:

• Allow: /dir_name1/file_name.html

• Disallow: /dir_name1/

Sitemap Directive:

• Sitemap: http://www.domain.com/sitemap.xml

• Sitemap: http://www.domain.com/dir/s/names-sitemap.xml.gz

Request-rate Directive:

• User-agent: *

• Disallow: /admin/

• Request-rate: 1/5

Visit-time Directive:

• User-agent: *

• Disallow: /admin/

• Visit-time: 1300-2030

Comment Directive:

• User-agent: YahooSeeker/1.1

• Comment: because Yahoo sucks :P

• Restrict crawling where it's not needed with robots.txt.

• Use more secure methods for sensitive content.

• Avoid allowing search result-like pages to be crawled.

• Avoid allowing URLs created as a result of proxy

services to be crawled.

• Create separate robots.txt file for each subdomain.

<a href=“http://www.domain.com” rel=“nofollow”>Spam</a>

• If your site has a blog with public commenting turned on, links within

those comments could pass your reputation to pages that you may not

be comfortable vouching for. Blog comment areas on pages are highly

susceptible to comment spam. Nofollowing these user-added links

ensures that you're not giving your page's hard-earned reputation to a

spammy site.

• Use anti-spam tools.

• Turn on comment moderation.

• Use “nofollow” tags.

• Disallow hyperlink in comments.

• Block comments pages using robots.txt or META tags.

• Think twice before enabling guestbook or comments.

• Use blacklist to prevent repetitive spamming attempts.

• Add a “report spam” feature to user profiles and friend invitations.

• Monitor your site for spammy pages.

How to get on Google’s first page

• On Site Factors.

• On Page Factors.

• Off Page Factors.

SEO for Mobile Phones

• Notify Google about mobile sites

• Guide mobile users accurately

Promotions and Analysis

• Promote your website in the right way

• Make use of free webmasters tools

THANK YOUX

dealing with crawlers

Technology