Robots.txt Test
What is it?
Pass rate:
-
Top 100 websites: 99%This value indicates the percent of top 100 most visited websites in the US that pass this test (in the past 12 months).
-
All websites: 86%This value indicates the percent of all websites analyzed in SEO Site Checkup (500,000+) in the past 12 months.
| 2021 | 94% |
|---|---|
| 2022 | 99% |
| 2023 | 99% |
| 2024 | 99% |
100
75
50
25
0
How do I fix it?
In order to pass this test you must create and properly install a robots.txt file.
For this, you can use any program that produces a text file or you can use an online tool (Google Webmaster Tools has this feature).
Remember to use all lower case for the filename: robots.txt, not ROBOTS.TXT.
A simple robots.txt file looks like this:
User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /pages/thankyou.html
This would block all search engine robots from visiting "cgi-bin" and "images" directories and the page "http://www.yoursite.com/pages/thankyou.html"
TIPS:
- You need a separate Disallow line for every URL prefix you want to exclude
- You may not have blank lines in a record because they are used to delimit multiple records
- Notice that before the Disallow command, you have the command: User-agent: *. The User-agent: part specifies which robot you want to block. Major known crawlers are: Googlebot (Google), Googlebot-Image (Google Image Search), Baiduspider (Baidu), Bingbot (Bing)
- One important thing to know if you are creating your own robots.txt file is that although the wildcard (*) is used in the User-agent line (meaning "any robot"), it is not allowed in the Disallow line.
- Regular expressions are not supported in either the User-agent or Disallow lines
Once you have your robots.txt file, you can upload it in the top-level directory of your web server. After that, make sure you set the permissions on the file so that visitors (like search engines) can read it.