WordPress robots.txt: Best-practice example for SEO

Your robots.txt file is a powerful tool when you’re working on a website’s SEO – but it should be handled with care. It allows you to deny search engines access to different files and folders, but often that’s not the best way to optimize your site. Here, we’ll explain how we think webmasters should use their robots.txt file, and propose a ‘best practice’ approach suitable for most websites.

You’ll find a robots.txt example that works for the vast majority of WordPress websites further down this page. If want to know more about how your robots.txt file works, you can read our ultimate guide to robots.txt.

What does “best practice” look like?

Search engines continually improve the way in which they crawl the web and index content. That means what used to be best practice a few years ago doesn’t work anymore, or, may even harm your site.

Today, best practice means relying on your robots.txt file as little as possible. In fact, it’s only really necessary to block URLs in your robots.txt file when you have complex technical challenges (e.g., a large eCommerce website with faceted navigation), or when there’s no other option.

Blocking URLs via robots.txt is a ‘brute force’ approach, and can cause more problems than it solves.

For most WordPress sites, the following example is best practice:

# This space intentionally left blank
# If you want to learn about why our robots.txt looks like this, read this post: https://yoa.st/robots-txt
User-agent: *

We even use this approach in our own robots.txt file.

What does this code do?

  • The User-agent: * instruction states that any following instructions apply to all crawlers.
  • Because we don’t provide any further instructions, we’re saying “all crawlers can freely crawl this site without restriction”.
  • We also provide some information for humans looking at the file (linking to this very page), so that they understand why the file is ’empty’.

If you have to disallow URLs

If you want to prevent search engines from crawling or indexing certain parts of your WordPress site, it’s almost always better to do so by adding meta robots tags or robots HTTP headers.

Our ultimate guide to meta robots tags explains how you can manage crawling and indexing ‘the right way’, and our Yoast SEO plugin provides the tools to help you implement those tags on your pages.

If your site has crawling or indexing challenges that can’t be fixed via meta robots tags or HTTP headers, or if you need to prevent crawler access for other reasons, you should read our ultimate guide to robots.txt.

Note that WordPress and Yoast SEO already automatically prevent indexing of some sensitive files and URLs, like your WordPress admin area (via an x-robots HTTP header).

Why is this ‘minimalism’ best practice?

Robots.txt creates dead ends

Before you can compete for visibility in the search results, search engines need to discover, crawl and index your pages. If you’ve blocked certain URLs via robots.txt, search engines can no longer crawl through those pages to discover others. That might mean that key pages don’t get discovered.

Robots.txt denies links their value

One of the basic rules of SEO is that links from other pages can influence your performance. If a URL is blocked, not only won’t search engines crawl it, but they also might not distribute any ‘link value’ pointing to that URL to, or through that URL to other pages on the site.

Google fully renders your site

People used to block access to CSS and JavaScript files in order to keep search engines focused on those all-important content pages.

Nowadays, Google fetches all of your styling and JavaScript and renders your pages completely. Understanding your page’s layout and presentation is a key part of how it evaluates quality. So Google doesn’t like it at all when you deny it access to your CSS or JavaScript files.

Previous best practice of blocking access to your wp-includes directory and your plugins directory via robots.txt is no longer valid, which is why we worked with WordPress to remove the default disallow rule for wp-includes in version 4.0.

Many WordPress themes also use asynchronous JavaScript requests – so-called AJAX – to add content to web pages. WordPress used to block Google from this by default, but we fixed this in WordPress 4.4.

You (usually) don’t need to link to your sitemap

The robots.txt standard supports adding a link to your XML sitemap(s) to the file. This helps search engines to discover the location and contents of your site.

We’ve always felt that this was redundant; you should already by adding your sitemap to your Google Search Console and Bing Webmaster Tools accounts in order to access analytics and performance data. If you’ve done that, then you don’t need the reference in your robots.txt file.

Read more: Preventing your site from being indexed: the right way »

The post WordPress robots.txt: Best-practice example for SEO appeared first on Yoast.