dealing with crawlers

14
WELCOME Search Engine Optimization X Dealing with Crawlers

Upload: hashem-zahran

Post on 16-Jul-2015

2.069 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Dealing with Crawlers

WELCOME

Search Engine Optimization

X

Dealing with Crawlers

Page 2: Dealing with Crawlers

Dealing with the Crawlers the good & the bad robots …………………………..

1. What is robots.txt file? The Definition ………………………………...

2. Structure of robots.txt file for SEO purpose The Syntax ……….

a. Standard User-agent, Disallow ……………………………..….

b. Nonstandard Crawl-delay, Allow, Sitemap ………………….

c. Extended Request-rate, Visit-time, Comment ………..…….

3. Effective use of robots.txt file Best Practices ………………………

4. Be aware of rel=“nofollow” Comment Spammers …...………………

5. User Generated Spam How to avoid …………………....………………

4

5

6

7

9

10

11

12

13

Page 3: Dealing with Crawlers
Page 4: Dealing with Crawlers

A text file placed in the root directory and it is used to

communicate with the search engines regarding the

sections which you don’t want them to crawl and

index.

• It is not necessary to have a robots.txt file.

• Make sure that you place the robots.txt file in the main directory.

• Restrict crawling where it's not needed.

• Common robot traps “Forms, Logins, Session IDs, Frames”.

Page 5: Dealing with Crawlers

It is easy to create a robots.txt file as the structure of

the robots.txt file is simple and basically, it contains

the list of user agents and the files, directories which

are to be excluded from crawling and indexing.

Standard

• User-agent:

• Disallow:

Nonstandard

• Crawl-delay:

• Allow:

• Sitemap:

Extended Standard

• Request-rate:

• Visit-time:

• Comment:

Page 6: Dealing with Crawlers

If you would like to set value for all crawlers use:

• User-agent: *

If you would like to set a value to a specific search engine robot:

• User-agent: BotName

A Complete updated list of Bots can be found at:

• http://www.user-agents.org/

Page 7: Dealing with Crawlers

To allow all robots to visit all files:

• User-agent: *

• Disallow:

To keep all robots out:

• User-agent: *

• Disallow: /

To tell all robots not to visit

specific directories:

• User-agent: *

• Disallow: /dir_name/

• Disallow: /users/login/

To tell all robots not to visit

specific files:

• User-agent: *

• Disallow: /dir_name/file.html

To tell specific robot not to visit

specific Directory:

• User-agent: BotName

• Disallow: /dir_name/

To tell specific robot not to visit

specific File:

• User-agent: BotName

• Disallow: /dir_name/file.html

Page 8: Dealing with Crawlers

Crawl-delay Directive:

• User-agent: *

• Crawl-delay: 10

Allow Directive:

• Allow: /dir_name1/file_name.html

• Disallow: /dir_name1/

Sitemap Directive:

• Sitemap: http://www.domain.com/sitemap.xml

• Sitemap: http://www.domain.com/dir/s/names-sitemap.xml.gz

Page 9: Dealing with Crawlers

Request-rate Directive:

• User-agent: *

• Disallow: /admin/

• Request-rate: 1/5

Visit-time Directive:

• User-agent: *

• Disallow: /admin/

• Visit-time: 1300-2030

Comment Directive:

• User-agent: YahooSeeker/1.1

• Comment: because Yahoo sucks :P

Page 10: Dealing with Crawlers

• Restrict crawling where it's not needed with robots.txt.

• Use more secure methods for sensitive content.

• Avoid allowing search result-like pages to be crawled.

• Avoid allowing URLs created as a result of proxy

services to be crawled.

• Create separate robots.txt file for each subdomain.

Page 11: Dealing with Crawlers

<a href=“http://www.domain.com” rel=“nofollow”>Spam</a>

• If your site has a blog with public commenting turned on, links within

those comments could pass your reputation to pages that you may not

be comfortable vouching for. Blog comment areas on pages are highly

susceptible to comment spam. Nofollowing these user-added links

ensures that you're not giving your page's hard-earned reputation to a

spammy site.

Page 12: Dealing with Crawlers

• Use anti-spam tools.

• Turn on comment moderation.

• Use “nofollow” tags.

• Disallow hyperlink in comments.

• Block comments pages using robots.txt or META tags.

• Think twice before enabling guestbook or comments.

• Use blacklist to prevent repetitive spamming attempts.

• Add a “report spam” feature to user profiles and friend invitations.

• Monitor your site for spammy pages.

Page 13: Dealing with Crawlers

How to get on Google’s first page

• On Site Factors.

• On Page Factors.

• Off Page Factors.

SEO for Mobile Phones

• Notify Google about mobile sites

• Guide mobile users accurately

Promotions and Analysis

• Promote your website in the right way

• Make use of free webmasters tools

Page 14: Dealing with Crawlers

THANK YOUX