This is already the third post in our Google Webmaster Tools series. Last week we’ve written about the Search Appearance section and the Search Traffic section of Google Webmaster Tools. So if you jumped in here, and want to start at the beginning, please read those posts first.
Today we’ll be going into the Google Index section, which obviously gives you some insight into how your website is being indexed in Google.
Index Status
The Index Status shows you how many URLs of your website have been found and added to Google’s index:
This can give you a good idea of how your site is doing in the index. If you see this line dropping, for instance, you’d know there’s an issue. Basically any major and unexpected change to this graph should be something you look into.
Actually, the “Advanced” tab gives you just a bit more insight into how all your indexed pages are divided:
As you can see, this shows you how many of your pages are being blocked by your robots.txt as well. And you can also see how many of your pages have been removed from the index, but more on that in the next chapter.
There’s something else this graph makes clear. As of March 9th of last year (at the “update” line) Google Webmaster Tools shows the data separately for both HTTP and HTTPS websites. This means that if you moved your site from HTTP to HTTPS since then, you’ll need to add your site again, using the red “Add a site” button. Then, fill in the entire URL, including the HTTP or HTTPS part:
Interpretation of the Index Status
There’s a few things you should always look for when checking out your Index Status:
- Your indexed pages should be a steadily increasing number. This tells you two things: Google can index your site and you keep your site ‘alive’ by adding content;
- Sudden drops in the graph. This means Google is having trouble accessing (all of) your website. Something is blocking Google out, whether it’s robots.txt changes or a server that’s down: you need to look into it! This could also have to do with the separate HTTP and HTTPS tracking I mentioned above;
- Sudden (and unexpected) spikes in the graph. This could be an issue with duplicate content (such as both www and non-www, wrong canonicals, etc.), automatically generated pages, or even hacks.
Content Keywords
The Content Keywords area gives you a pretty good idea of what keywords are important for your website. When you click on the Content Keywords menu item, it’ll give you a nice list of keywords right away:
These keywords are found on your website by Google. This does not mean you’re ranking for these keywords, it just means they’re the most relevant keywords for your site according to Google. You can also extend this list to 200 items, so it’ll give you a pretty broad idea.
This actually tells us a few things about your site. It shows you what Google thinks is most important for your website. Does this align with your own ideas of what your website’s about? For instance, if you find any keywords here that you didn’t expect, such as “Viagra” or “payday loan”, this could mean that your site has been hacked. And next to that, if you’d expect any keywords that you can’t find in this list, there’s a few things you can check:
- Your robots.txt might be blocking the page(s) that contain the keyword(s) you’re expecting;
- The page containing the keyword might not be old enough yet for Google to have crawled it;
- Google excludes keywords they consider boilerplate or common words from this list. What they’d consider boilerplate or common differs per site.
Blocked Resources
A new addition (March 11, 2015) to the Google Index section is Blocked Resources. This report shows you how many of your resources are blocked for Google:
The report will show where these resources are hosted. Clicking the host will take you to a second page showing all the blocked files per host. Again, that report will show how many pages are affected per file. These files could be images used on your site, but also CSS or JavaScript files. In that detailed report, you can click on a file and see all the pages it is listed on, as well as the last date Google detected these.
After you have found the blocked files, you can use the Fetch and Render option to view the page as Google and decide on whether you need to fix this right away or not. Depending on the impact of the blocked file, of course. Google Webmaster Tools will guide you in fixing the robots.txt block for files hosted on your own sites. For blocked resources with a larger impact that are hosted on a third party site, you’ll have to contact the owner of that host / website and ask them to remove the block. Always ask yourself if you really need that file first!
Remove URLs
This section of Google Webmaster Tools gives you the power you’d expect: to remove URLs from Google’s search results altogether. You can actually also block those pages from coming up in the search results by disallowing the page in your robots.txt. Or you can make it a password protected page, if that suits you better.
However, you can also do this from Google Webmaster Tools quite easily:
Just type in your URL and hit “Continue”. The next window will give you three options to removing a URL from the search results.
The first option will completely remove the URL you entered from the search results, along with the cache. You can find the cached version of your website here in Google:
So the first option would remove both that cached version and your entire result. The second option would only remove the cached version from the Google search results. The third option (Remove directory) would remove both of these things for not only that page, but also for every subpage. So removing the directory yoast.com/test/ would also remove yoast.com/test/test1/ and so on.
Be sure to only remove pages you don’t want to be showing up in the Google search results anymore. Don’t use this for crawl errors, 404s or anything like that. Those things should be fixed differently. Also, if the page is meant to stay out of the Google search results, be sure to remove the page from your site (404 or 410 error) or disallow Google from crawling the page in your robots.txt. Be sure to do this within 90 days of using this removal request! Otherwise, your page might get reindexed.
Conclusion
The Google Index section of your Google Webmaster Tools is a great section for monitoring how Google’s handling your site. Whether Google has suddenly stopped indexing your website, or has a different idea of what your site’s about, this is the section to find that out.
So be sure to keep an eye on this! And also keep an eye out for the next post on Google Webmaster Tools, which will be about the Crawl section. That post will go a long way in pinpointing how to find out where the issues you found in the Google Index section came from.
That’s it! Please let us know what you think in the comments!
This post first appeared as Google Webmaster Tools: Google Index on Yoast. Whoopity Doo!