...

Robots.txt Blocked

Have you ever gotten a notification from Google that says some of your pages are “Indexed, though blocked by robots.txt”? This can seem confusing because these two things shouldn’t happen at the same time.

A robots.txt file is a set of instructions for search engines. It tells them which pages on your site they should and shouldn’t crawl. For example, you might use it to block pages you don’t want to appear in search results, like private backend pages or old thin content.

The error happens when Google discovers a page it can’t crawl because of your robots.txt file, but it still decides to show it in search results anyway. This is usually because other websites have linked to the page, making Google think it’s important enough to index.

How to Find This Error on Your Website

Before you can fix the problem, you need to know where it is. You can find this error by checking your site’s indexing report. You’ll see a list of URLs that are being indexed even though they are blocked.

If you are seeing this error, the first thing you should do is make sure you aren’t accidentally trying to get a page indexed that you actually want to keep out of search results. You can check for any hidden pages or duplicate content that you may not want to appear in search results. 

How to Fix This Common SEO Problem

Step 1: Check Your robots.txt File

The most common reason for this error is that your robots.txt file is not set up correctly. You should review the file and make sure that you are not accidentally blocking pages that should be indexed. For example, a single typo could prevent an entire section of your website from being crawled.

If you want search engines to crawl your entire site, your robots.txt file should have these simple lines:

User-agent: *

Disallow:

This tells search engines that they are allowed to crawl everything on your site.

The reason Google indexes a blocked page is often because of other links pointing to it. You can’t control what other websites do, but you can use a noindex tag to solve this problem.

A noindex tag is a line of code you place in the head section of your page. It looks like this:

<meta name=”robots” content=”noindex”>

This tag tells search engines not to index the page, even if other websites link to it. This is a stronger signal than robots.txt and is the best way to make sure a page never appears in search results.

Step 3: Get Ready for Indexing

Once you’ve decided which pages you want to be indexed, you need to make sure they are ready. You can rewrite the content on a page to make sure it’s high-quality and free of any errors. A tool like our AI Rewording Tool can help you improve your existing content and make it more readable and engaging for both users and search engines.

What is a robots.txt file?

A robots.txt file is a set of instructions for search engines that tells them which pages on a website they are allowed and not allowed to crawl.

What does 'Indexed, though blocked by robots.txt' mean?

This error means that a search engine has found a page and decided to index it, even though the robots.txt file has told it not to.

Should I use robots.txt or a noindex tag?

You should use a noindex tag to keep a page out of search results. A robots.txt file is best for telling search engines which pages to crawl. A noindex tag is a stronger signal and is the best way to make sure a page is not indexed.

How do I fix a robots.txt issue?

To fix a robots.txt issue, you should review your file for any errors. You should also check for any unnecessary links to the page from other websites.

Rocket

Automate Your SEO

You're 1 click away from increasing your organic traffic!

Start Optimizing Now!

SEO Academy

  1. AMP