seo robots txt file
DESCRIPTION
what is the use of robots.txt fileTRANSCRIPT
![Page 1: SEO Robots txt FILE](https://reader036.vdocuments.site/reader036/viewer/2022081813/55865034d8b42a7d3d8b4739/html5/thumbnails/1.jpg)
Robots.txt File
What is robots.txt?
The robots.txt is a simple text file in your web site that inform search engine
bots how to crawl and index website or web pages.
By default search engine bots crawl everything possible unless they are
forbidden from doing so. They always scan the robots.txt file before crawling the web
site.
Declaring a robots.txt means that visitors (bots) are not allowed to index sensitive
data but it doesn’t mean that they can’t. The legal/good bots follow what is instructed to
them but the Malware robots don’t care about it, so don’t try to use it as a security for
your web site.
How to build a robots.txt file (Terms, Structure & Placement)?
The terms used in robots.txt and their meanings are given in tabular format.
The robots.txt is usually placed in the root folder of your web site so that the URL of
your robots.txt file resembles www.example.com/robots.txt in the web browser.
Remember that you use all the lower case letter for the filename.
![Page 2: SEO Robots txt FILE](https://reader036.vdocuments.site/reader036/viewer/2022081813/55865034d8b42a7d3d8b4739/html5/thumbnails/2.jpg)
Robots.txt File
You can define different restrictions to different bots by applying bot specific rules but be
aware that the more you make it complicated; it becomes harder for you to understand
its traps. Always specify bot specific rules before specifying common rules so that bots
read the file till the end to find rules specific to their names or else follow common rules.
You can check our many other sites robots.txt to get a feel on how these are generally implemented.
http://www.searchenabler.com/robots.txt http://www.google.com/robots.txt http://searchengineland.com/robots.txt
Example scenarios for robots.txt
If you have a close look at Search Enabler robots.txt, you can notice that we have
blocked following pages from search indexing. You can analyze which pages and links
should be blocked from your website. On a general note we advice hiding pages such
as search results page within your web site and user logins, profiles, logs and styling
CSS sheets.
1. Disallow: /?s=
It is a dynamic search results page and there is no point in indexing it which will
create duplicate content problems.
2. Disallow: /blog/2010/
These are the blogs categorized in a year wise patterns and are blocked because they
lead to duplication errors with different URLs pointing to the same web page.
3. Disallow: /login/
It is a login page meant only for users of searchenabler tool so it is blocked from getting
crawled.
How does robots.txt affect search results?
By using the robots.txt file, you can hide the pages such as user profiles and other temp
folders from being indexed and does not divulge your SEO effort into junk or the pages
which are useless for the search results. In general, you results will be more precise
and better valued.
![Page 3: SEO Robots txt FILE](https://reader036.vdocuments.site/reader036/viewer/2022081813/55865034d8b42a7d3d8b4739/html5/thumbnails/3.jpg)
Robots.txt File
Default Robots.txt
Default Robots.txt file basically tells every crawler that it is allowed any web site
directory to its heart content:
User-agent: *
Disallow:
(which translates as “disallow nothing”)
The often asked question here is why to use it at all. Well, it is not required but
recommended to use for the simple reason that search bots will request it anyway (this
means you’ll see 404 errors in your log files from bots requesting your non-existent
Robots.txt page). Besides, having a default Robots.txt will ensure there won’t be any
misunderstandings between your site and a crawler.
Robots.txt Blocking Specific Folders / Content:
The most common usage of Robots.txt is to ban crawlers from visiting private folders or
content that gives them no additional information. This is done primarily in order to save
the crawler’s time: bots crawl on a budget – if you ensure that it doesn’t waste time on
unnecessary content, it will crawl your site deeper and quicker.
Samples of Robots.txt files blocking specific content (note: I highlighted only a few
most basic cases):
User-agent: *
Disallow: /database/
(blocks all crawlers from /database/ folder )
User-agent: *
Disallow: /*?
(blocks all crawlers from all URL’s containing ? )
User-agent: *
Disallow: /navy/
Allow: /navy/about.html
(blocks all crawlers from /navy/ folder but allow access to one page from this folder)
Note from John Mueller commenting below:
The “Allow:” statement is not a part of the robots.txt standard (it is however supported
by many search engines, including Google)
![Page 4: SEO Robots txt FILE](https://reader036.vdocuments.site/reader036/viewer/2022081813/55865034d8b42a7d3d8b4739/html5/thumbnails/4.jpg)
Robots.txt File
Robots.txt Allowing Access to Specific Crawlers
Some people choose to save bandwidth and allow access to only those crawlers they
care about (e.g. Google, Yahoo and MSN). In this case, Robots.txt file should list those
Robots followed by the command itself, etc:
User-agent: *
Disallow: /
User-agent: googlebot
Disallow:
User-agent: slurp
Disallow:
User-agent: msnbot
Disallow:
(the first part blocks all crawlers from everything, while the following 3 blocks list those 3
crawlers that are allowed to access the whole site)
Need Advanced Robots.txt Usage?
I tend to recommend people to refrain from doing anything too tricky in their Robots.txt
file unless they are 100% knowledgeable in the topic. Messed-up Robots.txt file can
result in screwed project launch.
Many people spend weeks and months trying to figure why there site is ignored by
crawlers until they realize (often with some external help) that they have misused their
Robots.txt file. The better solution for controlling crawler activity might be to get away
with on-page solutions (robots meta tags). Aaron did a great job summing up the
difference in his guide(bottom of the page).
![Page 5: SEO Robots txt FILE](https://reader036.vdocuments.site/reader036/viewer/2022081813/55865034d8b42a7d3d8b4739/html5/thumbnails/5.jpg)
Robots.txt File
Best Robots.txt Tools: Generators and Analyzers
While I do not encourage anyone to rely too much on Robots.txt tools (you should either
make your best to understand the syntax yourself or turn to an experienced consultant
to avoid any issues), the Robots.txt generators and checkers I am listing below will
hopefully be ofadditional help:
Robots.txt generators:
Common procedure:
1. choose default / global commands (e.g. allow/disallow all robots);
2. choose files or directories blocked for all robots;
3. choose user-agent specific commands:
1. choose action;
2. choose a specific robot to be blocked.
As a general rule of thumb, I don’t recommend using Robots.txt generators for the
simple reason: don’t create any advanced (i.e. non default) Robots.txt file until you are
100% sure you understand what you are blocking with it. But still I am listing two most
trustworthy generators to check:
Google Webmaster tools: Robots.txt generator allows to create simple
Robots.txt files. What I like most about this tool is that it automatically adds all
global commands to each specific user agent commands (helping thus to
avoid one of the most common mistakes):
SEObook Robots.txt generator unfortunately misses the above feature but it is
really easy (and fun) to use:
![Page 6: SEO Robots txt FILE](https://reader036.vdocuments.site/reader036/viewer/2022081813/55865034d8b42a7d3d8b4739/html5/thumbnails/6.jpg)
Robots.txt File
Robots.txt checkers:
Google Webmaster tools: Robots.txt analyzer “translates” what your Robots.txt
dictates to the Googlebot:
Robots.txt Syntax Checker finds some common errors within your file by
checking for whitespace separated lists, not widely supported standards,
wildcard usage, etc.
A Validator for Robots.txt Files also checks for syntax errors and confirms correct
directory paths.