Google Webmaster Tools: Crawl

Google Webmaster Tools: CrawlThe section in Google Webmaster Tools that works most closely with our WordPress SEO Premium plugin, is the Crawl section. In our premium plugin, you’ll find a Webmaster Tools section, that lists all pages Google somehow did not find on your website. You can easily import these into the plugin and redirect the ones that need redirecting, so the crawl error is resolved.

But there is more in that fourth ‘chapter’ in Google Webmaster Tools. Following our posts on Search Appearance, Search Traffic and Google Index, this article digs into for instance crawl errors and stats, finding out how Google sees your website and your XML sitemaps.

Crawl Errors

This section lists two types of errors, being site errors and URL errors. Site errors simply lists whether your DNS works, the server can easily be reached (no timeouts, for instance) and if Google can access your robots.txt file.

Google Webmaster Tools: Crawl / Site error

Google provides background information on the error (when did it occur, how often in the last 90 days). If things like this happen too much (as in more than once or twice a year without advance warning), be sure to contact your hosting company or switch hosts altogether.

The URL error section is divided into multiple sections and subsections. First, you can check for errors that Google gets when acting a number of different devices. On for instance the desktop tab, we find the number of server errors, access denied errors and page not found errors.

Google Webmaster Tools: crawl / URL errors

Please monitor these errors, as too many errors could send a signal of low quality (bad maintenance) to Google. A 404 can be redirected, like mentioned for instance in our WordPress SEO Premium plugin, or with a bit more work straight in your .htaccess file. After that, check the checkbox in front of the URL and click Mark As Fixed (this will just make sure the list is cleaned up, it won’t do much besides that).

We have found in our reviews, that a lot of people either ignore the errors here, or forget to mark errors as fixed. This will only lead to a very long list of errors. Clean that list up now and then. If you want to check if any of these URLs are already fixed, click the link to find more information on the error and use Fetch as Google to see if the URL is now accessible for Google (or click Marked as fixed, of course).

Soft 404s

These tabs can also show Soft 404s. A soft 404 occurs, when a page as such exists, but has an empty ‘content area’. Let me elaborate on that. Google does a fine job on locating content on a page. It understands your header, sidebar, footer and content area. That also means that you can have a fully packed sidebar and footer, Google will still return a Soft 404 when the content area is empty. And by empty we also mean a category page with no posts, or any other page stating there is no content available. Or a page that just says 404 Not Found and returns a 200 OK server response anyway (happens more often than you think). This also occurs when you would for instance link an internal search page that just doesn’t return any results. There is no page, but an almost empty page is returned anyway.

Although the server will return a 200 OK message for that page, Google will consider it a (Soft) 404. You don’t want these pages on your website for a number of reasons. For one, these pages are obviously not very user-friendly. But besides that, Googlebot will have to go over all these pages for no reason at all (as they lack content), which will prevent your site from being crawled in an efficient way. After all, you don’t want Googlebot to spend time trying to see what’s on all these non-existing pages. Either add content to these pages or noindex them completely. In case of an empty category or tag page, consider removing the category or tag. You’re not using it anyway.

Smartphone visits

For your smartphone visits, Google also tests for faulty redirects and blocked URLs. A blocked URL is a page that is blocked for Googlebot-mobile for smartphones in robots.txt. Simply check if these are intentionally blocked and otherwise change robots.txt to allow access.

Faulty redirects occur when a page redirects to “an irrelevant landing page instead of the equivalent, smartphone-optimized version of the desktop page”. That could for instance be a redirect to your homepage instead of a redirect to the exact same page on your mobile site. Check the question mark next to labels in Google Webmaster Tools for more information on the terminology in Google Webmaster Tools, by the way. These explanations really come in handy, sometimes :)

Lastly, there can also be a tab for News at URL sections. That shows crawl errors for Google News content. Of course that is only for News sites. See the Google News requirements to see if your website qualifies at all for Google News, otherwise don’t bother looking for this tab.

In the end, you’ll just want this to be the only text in the Crawl Error section:

Google Webmaster Tools: No crawl errors

Crawl Stats

This is your handy overview of Googlebot activity on your website. It shows you the pages it crawled per day, the number of bytes downloaded per day and the time spent downloading a page. Depending on the number of pages of your website, this might be handy information.

Google Webmaster Tools: crawled per day

We think this is great for trend watching. If you have made changes to your site structure, robots.txt or for instance added XML sitemaps for the very first time, that should show in the provided graphs.

In case these stats show a drastically declining line, of even a flat line at zero, there’s probably something really wrong with either the website (robots.txt might be blocking Googlebot), or your server (it could be down, for instance). Again, monitor this.

Fetch as Google

As mentioned, in case of any crawl errors, you should look into what happened and why. One of the tools Google Webmaster Tools provides, is a way to view your website as Google does. You can fetch a page as Google. You could either click the link at Crawl Errors and click the Fetch as Google link in the pop-up, or go to the Fetch as Google section in Google Webmaster Tools to enter an URL manually:

Google Webmaster Tools: Fetch as Google

Note that in the image above, all rendered pages have been rendered quite a while ago – perhaps these already can be fetched the right way – so visit them to check this and perhaps even do a refetch. In the image, we see three different kinds of statuses (these go for both Fetch and Fetch and Render commands):

  • Partial: The page can be partially rendered, as some elements are probably not displayed as intended or not at all, for instance because you are blocking CSS or JS in your robots.txt file. If you click the line with the partial status in the overview, you’ll actually be taken to a snapshot of how Google rendered your page. On that page, Webmaster Tools will also tell you which resources it could not get, so you’ll be able to fix this.
  • Not Found: The page can’t be found. This might be because a redirect isn’t set yet in case of an URL / structure change, or perhaps you simply deleted the page (server returns a 404 error).
  • Unreachable: Googlebot didn’t have the patience to wait for your website to load (make it faster), or your server simply replied that it could not allow the request for the URL.

Of course there are more. Other statuses you might find here are:

  • Complete: That’s the one you want. Google managed to crawl the entire page.
  • Redirected: Either the server or your website itself (HTML/JS) told Google to visit another URL.
  • Not authorized: Your server tells Google that URL access is restricted or has been blocked from crawling (server returns a 403 error).
  • DNS not found: Perhaps you entered the wrong URL? The domain name seems incorrect.
  • Blocked: Your robots.txt tells Google to bugger off.
  • Unreachable robots.txt: Google can’t reach your robots.txt at all. More on testing the robots.txt below.
  • Temporarily unreachable: Either the server took too long to reply or too many consecutive requests were made to the server for different URLs.
  • Error: An error occurred when trying to complete the fetch (contact Webmaster Tools product support in this case).

By clicking the URL, you can see the rendered page as seen by both Googlebot and a visitor, so you can make an judgement on the impact of the blocked file:

Google Webmaster Tools: Render

In this case, the impact is clearly low.

Robots.txt Tester

Last week, Joost explained a lot about the Partial status at Fetch as Google in his post WordPress robots.txt Example. You really need to make sure your robots.txt is in order.

By the way, you might be wondering if you really need that robots.txt file. Actually, you don’t. If you think Google should crawl all sections on your server, you could leave it out. The Robots.txt tester will return this message in that case:

Google Webmaster Tools: No robots.txt

In the Robots.txt tester, Google will show the robots.txt you are using and tell you any and all issues Google finds:

Google Webmaster Tools: robots.txt warning

That’s a warning: Googlebot ignores that crawl delay. If for some reason you do want to set a crawl delay, please do so using the gear icon in the upper right in Google Webmaster Tools (at Site Settings > Crawl Rate – new crawl rates are valid for 90 days). Please note that this is not how often Google visits your site, it’s the speed of Googlebot’s requests during the crawl of your website.

Google Webmaster Tools: robots.txt error

Hm. And what do you want Google to do with /wordpress/wp-includes/? By the way, like in this example, we see a lot of webmasters adding a link to the sitemap to their robots.txt. No problem, but why not simply add that in Google Webmaster Tools instead? More on that later.

gwt robots txt wrong comment

This is another syntax not understood. Comments in robots.txt can be done using hashtags instead:

Google Webmaster Tools: robots.txt comment

Works like a charm.

gwt robots txt error

Ouch. The horror. Google could not find a user-agent in this robots.txt – but it’s there, right? It’s actually preceded by a space, and that immediately triggers a ‘syntax not understood’ for the robots.txt test.
It doesn’t mean all 20 restrictions in that robots.txt will be ignored, by the way. This seems to be a strict test, but Google is very capable of filtering that space. Google actually encourages whitespace for readability. But strictly speaking, it shouldn’t be in there.

Visit Webmaster Tools for even more information on the robots.txt syntax.

Test allowed / disallowed

One more thing. When you want to test whether a page or directory on your site can or can not be reached by Googlebot, or for instance Googlebot-News or Googlebot-Mobile, you can test that as well in this section, right below the robots.txt code.

If we take the last example above, and test the /Tests/ part of it, you’ll see that that indeed can be indexed if we follow the strict rules of the Robots.txt tester:

Google Webmaster Tools: robots.txt allowed

Although the text ‘Allowed’ is green, it’s not a good thing that this directory can be indexed. As mentioned, the space isn’t allowing Googlebot in this case, according to the Google search result pages:

Robots.txt tester isn't flawless

Feel free to add any insights on this particular issue in the comments below.

If you test a page or directory and find that it is blocked like this:

gwt robots txt url blocked

the test tool will also tell you what line in your robots.txt is causing this:

Google Webmaster Tools: robots.txt blocked by line

All in all, if you are using your robots.txt actively, make sure your robots.txt will do what you intended it to do. The Robots.txt tester will help you a lot.

Sitemaps

It doesn’t matter if you are setting up a new WordPress site, or have just installed our WordPress SEO plugin: activate XML sitemaps and remove any other plugin that does this as well. Don’t forget to redirect the old XML sitemap, probably at /sitemap.xml, to ours at /sitemap_index.xml. If you experience any issues, check our knowledge base.

Having said that, if you have a proper XML sitemap, go to Google Webmaster Tools and test and add it at Sitemaps. Unfortunately Google doesn’t allow you to add a tested sitemap immediately after testing. Yes, this is my feature request, Google ;)

Google Webmaster Tools: Tabs on sitemaps

These sitemaps can be added manually, but perhaps Google already found some. These are listed on the All tab:

Google Webmaster Tools: sitemaps found by Google

We often get questions about image XML sitemaps. Images are actually already inserted by our plugin in for instance post-sitemap.xml and page-sitemap.xml:

Google XML Sitemaps

Back to Google Webmaster Tools. First you want to make sure /sitemap_index.xml contains all the content types you want Google to index. Please check the XML sitemap section in our plugin and see if you can exclude any post types or taxonomies from the XML sitemaps; this usually already fixes a lot of warnings and errors. Especially on shops, where sitemaps can be created for shipping classes and cloth sizes, for instance, that would be my first advice.

Second, you add the /sitemap_index.xml, which will be added immediately, along with any sitemaps listed on that sitemap. If for some reason that sitemap still contains content you’d rather not have indexed, simply change that in our plugin and resubmit it in Google Webmaster Tools. Note that you can manually add and delete sitemaps, but the ones that are automatically added for instance by an index sitemap, can only be deleted by a resubmit.

I thought about adding a list of possible errors and warnings to this article as well. Seriously. But when I found that Webmaster Tools actually added internal navigation on their Sitemap Errors page, it seemed to make sense to simply link that page.

Common warnings are warnings about Google not being able to reach a page, due to a long response time or for instance an URL included in an XML sitemap, but excluded in robots.txt.

Errors vary from invalid date formats to sitemaps not being found (a 404 on your sitemap is never a good idea). A sitemap can also be empty, or a required tag can be missing.

Indexed versus submitted content

Another thing you might wonder about is the difference between index and submitted content types (pages, video, images). These are the red and blue bars in this section. The red bar (indexed types) is usually a bit lower, as Google isn’t crawling your entire site at once. Time is precious, so Google spiders a (large) number of pages at a time, but if your site structure goes a gazillion levels deep, chances are Googlebot isn’t getting to these deepest pages in a crawl. It’s not that Google bookmarks where they end up and start from there the next time it crawls your website. This emphasizes the need for a good internal link structure, well formatted sitemaps and things like that.

URL Parameters

Let’s start with Google’s warning here:

Use this feature only if you’re sure how parameters work. Incorrectly excluding URLs could result in many pages disappearing from search.

If you are using URL parameters, like the default s for WordPress search, please check this section. When discussing this post with Joost, he told from experience that f.i. in case of a site migrations, things might go terribly wrong if this isn’t done properly.

In this section, you can tell Google how to handle your parameters. When clicking Add a Parameter, you’ll get a pop-up with these options:

gwt url parameters

I entered the ‘s’ for search, and have to decide on that parameter to affect page content or just change the way content is displayed on the page. Google respectively calls this passive and active URL parameters. Active parameters can for instance be used for sorting, pagination and sometimes even translations or categorization. Passive parameters are usually for tracking or referrals, like Magento’s SID (session ID) and Google’s own utm_source.

Now if you feel the parameter “changes, reorders or narrows page content”, and pick Yes in the above select box, you’ll be presented with four more options. You can set here how you want Google to handle the parameter:

  1. Let Googlebot decide: a general option if you’re not sure what to choose here.
  2. Every URL: Every URL using this parameter is an entirely new page or product.
  3. Only URLs with specified value: This will indicate to Google that you only want URLs crawled with a specific value for this parameter and forget the rest, for instance to avoid duplicate content due to sorting options.
  4. No URLs: Don’t crawl pages with this parameter at all. Avoiding duplicate content is a reason to use this one as well.

Note that instead of using URL parameters for option 3 and 4, you could also set the right canonical on all of these pages.

TL;DR?

Sorry to disappoint you. There is no Too Long; Didn’t Read in this. In the previous posts on Google Webmaster Tools, we have already emphasized the importance of checking your site now and then, or monitoring it actively. Google Webmaster Tools helps a lot with that, and this is one of the longest (and in my opinion most interesting) sections in Google Webmaster Tools.

Feel free to drop any addition or question related to this section in the comments. Looking forward to it!

This post first appeared as Google Webmaster Tools: Crawl on Yoast. Whoopity Doo!

WordPress robots.txt: Best-practice example for SEO

Your robots.txt file is a powerful tool when you’re working on a website’s SEO – but it should be handled with care. It allows you to deny search engines access to different files and folders, but often that’s not the best way to optimize your site. Here, we’ll explain how we think webmasters should use their robots.txt file, and propose a ‘best practice’ approach suitable for most websites.

You’ll find a robots.txt example that works for the vast majority of WordPress websites further down this page. If want to know more about how your robots.txt file works, you can read our ultimate guide to robots.txt.

What does “best practice” look like?

Search engines continually improve the way in which they crawl the web and index content. That means what used to be best practice a few years ago doesn’t work anymore, or, may even harm your site.

Today, best practice means relying on your robots.txt file as little as possible. In fact, it’s only really necessary to block URLs in your robots.txt file when you have complex technical challenges (e.g., a large eCommerce website with faceted navigation), or when there’s no other option.

Blocking URLs via robots.txt is a ‘brute force’ approach, and can cause more problems than it solves.

For most WordPress sites, the following example is best practice:

# This space intentionally left blank
# If you want to learn about why our robots.txt looks like this, read this post: https://yoa.st/robots-txt
User-agent: *

We even use this approach in our own robots.txt file.

What does this code do?

  • The User-agent: * instruction states that any following instructions apply to all crawlers.
  • Because we don’t provide any further instructions, we’re saying “all crawlers can freely crawl this site without restriction”.
  • We also provide some information for humans looking at the file (linking to this very page), so that they understand why the file is ’empty’.

If you have to disallow URLs

If you want to prevent search engines from crawling or indexing certain parts of your WordPress site, it’s almost always better to do so by adding meta robots tags or robots HTTP headers.

Our ultimate guide to meta robots tags explains how you can manage crawling and indexing ‘the right way’, and our Yoast SEO plugin provides the tools to help you implement those tags on your pages.

If your site has crawling or indexing challenges that can’t be fixed via meta robots tags or HTTP headers, or if you need to prevent crawler access for other reasons, you should read our ultimate guide to robots.txt.

Note that WordPress and Yoast SEO already automatically prevent indexing of some sensitive files and URLs, like your WordPress admin area (via an x-robots HTTP header).

Why is this ‘minimalism’ best practice?

Robots.txt creates dead ends

Before you can compete for visibility in the search results, search engines need to discover, crawl and index your pages. If you’ve blocked certain URLs via robots.txt, search engines can no longer crawl through those pages to discover others. That might mean that key pages don’t get discovered.

Robots.txt denies links their value

One of the basic rules of SEO is that links from other pages can influence your performance. If a URL is blocked, not only won’t search engines crawl it, but they also might not distribute any ‘link value’ pointing to that URL to, or through that URL to other pages on the site.

Google fully renders your site

People used to block access to CSS and JavaScript files in order to keep search engines focused on those all-important content pages.

Nowadays, Google fetches all of your styling and JavaScript and renders your pages completely. Understanding your page’s layout and presentation is a key part of how it evaluates quality. So Google doesn’t like it at all when you deny it access to your CSS or JavaScript files.

Previous best practice of blocking access to your wp-includes directory and your plugins directory via robots.txt is no longer valid, which is why we worked with WordPress to remove the default disallow rule for wp-includes in version 4.0.

Many WordPress themes also use asynchronous JavaScript requests – so-called AJAX – to add content to web pages. WordPress used to block Google from this by default, but we fixed this in WordPress 4.4.

You (usually) don’t need to link to your sitemap

The robots.txt standard supports adding a link to your XML sitemap(s) to the file. This helps search engines to discover the location and contents of your site.

We’ve always felt that this was redundant; you should already by adding your sitemap to your Google Search Console and Bing Webmaster Tools accounts in order to access analytics and performance data. If you’ve done that, then you don’t need the reference in your robots.txt file.

Read more: Preventing your site from being indexed: the right way »

The post WordPress robots.txt: Best-practice example for SEO appeared first on Yoast.

Google Webmaster Tools: Google Index

Google Webmaster Tools - Google IndexThis is already the third post in our Google Webmaster Tools series. Last week we’ve written about the Search Appearance section and the Search Traffic section of Google Webmaster Tools. So if you jumped in here, and want to start at the beginning, please read those posts first.

Today we’ll be going into the Google Index section, which obviously gives you some insight into how your website is being indexed in Google.

Index Status

The Index Status shows you how many URLs of your website have been found and added to Google’s index:

Webmaster Tools Index Status

This can give you a good idea of how your site is doing in the index. If you see this line dropping, for instance, you’d know there’s an issue. Basically any major and unexpected change to this graph should be something you look into.

Actually, the “Advanced” tab gives you just a bit more insight into how all your indexed pages are divided:

Webmaster Tools Index Status https yoast com

As you can see, this shows you how many of your pages are being blocked by your robots.txt as well. And you can also see how many of your pages have been removed from the index, but more on that in the next chapter.

There’s something else this graph makes clear. As of March 9th of last year (at the “update” line) Google Webmaster Tools shows the data separately for both HTTP and HTTPS websites. This means that if you moved your site from HTTP to HTTPS since then, you’ll need to add your site again, using the red “Add a site” button. Then, fill in the entire URL, including the HTTP or HTTPS part:

Webmaster Tools Home

Interpretation of the Index Status

There’s a few things you should always look for when checking out your Index Status:

  • Your indexed pages should be a steadily increasing number. This tells you two things: Google can index your site and you keep your site ‘alive’ by adding content;
  • Sudden drops in the graph. This means Google is having trouble accessing (all of) your website. Something is blocking Google out, whether it’s robots.txt changes or a server that’s down: you need to look into it! This could also have to do with the separate HTTP and HTTPS tracking I mentioned above;
  • Sudden (and unexpected) spikes in the graph. This could be an issue with duplicate content (such as both www and non-www, wrong canonicals, etc.), automatically generated pages, or even hacks.

Content Keywords

The Content Keywords area gives you a pretty good idea of what keywords are important for your website. When you click on the Content Keywords menu item, it’ll give you a nice list of keywords right away:

Webmaster Tools Content Keywords

These keywords are found on your website by Google. This does not mean you’re ranking for these keywords, it just means they’re the most relevant keywords for your site according to Google. You can also extend this list to 200 items, so it’ll give you a pretty broad idea.

This actually tells us a few things about your site. It shows you what Google thinks is most important for your website. Does this align with your own ideas of what your website’s about? For instance, if you find any keywords here that you didn’t expect, such as “Viagra” or “payday loan”, this could mean that your site has been hacked. And next to that, if you’d expect any keywords that you can’t find in this list, there’s a few things you can check:

  • Your robots.txt might be blocking the page(s) that contain the keyword(s) you’re expecting;
  • The page containing the keyword might not be old enough yet for Google to have crawled it;
  • Google excludes keywords they consider boilerplate or common words from this list. What they’d consider boilerplate or common differs per site.

Blocked Resources

A new addition (March 11, 2015) to the Google Index section is Blocked Resources. This report shows you how many of your resources are blocked for Google:

Google Webmaster Tools: Blocked Resources

The report will show where these resources are hosted. Clicking the host will take you to a second page showing all the blocked files per host. Again, that report will show how many pages are affected per file. These files could be images used on your site, but also CSS or JavaScript files. In that detailed report, you can click on a file and see all the pages it is listed on, as well as the last date Google detected these.

After you have found the blocked files, you can use the Fetch and Render option to view the page as Google and decide on whether you need to fix this right away or not. Depending on the impact of the blocked file, of course. Google Webmaster Tools will guide you in fixing the robots.txt block for files hosted on your own sites. For blocked resources with a larger impact that are hosted on a third party site, you’ll have to contact the owner of that host / website and ask them to remove the block. Always ask yourself if you really need that file first!

Remove URLs

This section of Google Webmaster Tools gives you the power you’d expect: to remove URLs from Google’s search results altogether. You can actually also block those pages from coming up in the search results by disallowing the page in your robots.txt. Or you can make it a password protected page, if that suits you better.

However, you can also do this from Google Webmaster Tools quite easily:

Webmaster Tools Remove URLs https yoast com

Just type in your URL and hit “Continue”. The next window will give you three options to removing a URL from the search results.

Webmaster_Tools_-_Remove_URLs

The first option will completely remove the URL you entered from the search results, along with the cache. You can find the cached version of your website here in Google:

yoast com Google Search

So the first option would remove both that cached version and your entire result. The second option would only remove the cached version from the Google search results. The third option (Remove directory) would remove both of these things for not only that page, but also for every subpage. So removing the directory yoast.com/test/ would also remove yoast.com/test/test1/ and so on.

Be sure to only remove pages you don’t want to be showing up in the Google search results anymore. Don’t use this for crawl errors, 404s or anything like that. Those things should be fixed differently. Also, if the page is meant to stay out of the Google search results, be sure to remove the page from your site (404 or 410 error) or disallow Google from crawling the page in your robots.txt. Be sure to do this within 90 days of using this removal request! Otherwise, your page might get reindexed.

Conclusion

The Google Index section of your Google Webmaster Tools is a great section for monitoring how Google’s handling your site. Whether Google has suddenly stopped indexing your website, or has a different idea of what your site’s about, this is the section to find that out.

So be sure to keep an eye on this! And also keep an eye out for the next post on Google Webmaster Tools, which will be about the Crawl section. That post will go a long way in pinpointing how to find out where the issues you found in the Google Index section came from.

That’s it! Please let us know what you think in the comments!

This post first appeared as Google Webmaster Tools: Google Index on Yoast. Whoopity Doo!

Google Webmaster Tools: Search Traffic

Google Webmaster Tools: Search TrafficFollowing Thijs’ article on Search Appearance in Google Webmaster Tools, I wanted to talk to you today about the second section: Search Traffic. Although the common thread is search traffic, the subsection deal with a lot of different topics like search queries and links.

In this article, we will explain all that can be found in these subsections.

Search Queries

Ever since Google started using SSL back in 2011, us webmasters find that annoying ‘keyword not provided’ in our Google Analytics stats. Google created an ‘alternative’ in an “aggregated list of the top 1,000 search queries that drove traffic to their site for each of the past 30 days through Google Webmaster Tools.” I’m under the impression this isn’t limited to the top 1,000 anymore (see screenshot below, which shows 9,142 queries) and it’s up to 90 days if I am not mistaken.

Google Webmaster Tools: Search Queries

Google Webmaster Tools: Search Queries

 

The interesting thing is that if we click the ‘With Change’ button right above the bottom table, we can actually see how traffic changed, and perhaps more importantly, how clicks from Google to the pages listed at ‘Query’ changed over time. Depending on the time span you select in the upper right corner, you could actually use this to test meta descriptions and titles.

Note that a number of things can also be found in Google Analytics, but these results can be filtered on Image search, Mobile search, Image search and Video search. But also on location, which will come in handy for people targeting domestic or specific markets abroad. The mobile filter, with more and more traffic being from mobile devices, is of course very interesting to keep a keen eye on – what are these specific people looking for? And are these pages optimized for mobile?

Right below the title in that screenshot (Search Queries), we find a second tab, “Top Pages”. It is similar, but instead of the search keyword, this page shows the URLs of your most visited pages. Always ask yourself if the pages that top that list are also the pages you want to rank for. If not, it could be necessary to get back to the drawing board and create a new site structure around these pages. That way you can leverage the rankings of these pages for the rest of your website. As mentioned, there is overlap with Google Analytics. The main difference is the absence of ‘keyword not provided’ ;-)

Links to Your Site

This section in Google Webmaster Tools provides what the title says: information on links to your website. It’s divided into three sections:

Google Webmaster Tools: Links to Your Site

Google Webmaster Tools: Links to Your Site

 

It’s a logical threesome: which websites links our content, what content is linked the most and what anchor / link texts are used the most.

Interesting to see Pinterest bringing in the most traffic, right. For our regular readers: yes, we used this example in our Social marketing post as well. But now Tumblr has even gone above Google traffic. In this case, social is the new Google.

For this site, it’s clear that the domain name is used the most as an anchor text, which is quite common. The second one is a general ‘visit site’, but in the top 25 of Anchor texts, 17 of the first 25 were along the lines of ‘the 50 best’, ’15 healthy ..’ and lists like that. For this specific site, that really seems to work and of course we would encourage them to create more lists like that. See for yourself what anchors are often used, it will probably give you a general idea of what kind of posts work / should be written for your website.

Of course this is also emphasized by the most linked content.

Google Webmaster Tools: Linked pages

Google Webmaster Tools: Linked pages

 

This is also interesting: 30K+ links from 43 domains, that probably means some ‘site wide’ links (links on every page of a website). In that case, it might pay to do some further investigating in tools like MajesticSEO or OpenLinkProfiler (of course there are more tools like that). Find the side wide links and see if you can improve that backlink, for instance by being linked in articles as well, instead of just in a sidebar. That will improve the quality of the link (not the traffic, per se).

Internal Links

On a lot of (WordPress) sites, most internal links will go to tag and (product) category pages. That seems to make sense. This section tells you if I am right about this. In one of the projects we worked on, we found this:

Google Webmaster Tools: Internal Links

Google Webmaster Tools: Internal Links

 

Everything about it is odd :) Why is that one tag page getting excessive internal links? This might be the case when you have just started tagging your products and this is the first category you have used for this. That should mean this list looks a lot different in a few weeks.

The second odd thing is that Google tells us a .htm page is linked from 76 pages. Luckily, Google Webmaster Tools allows you to find that page using the search bar at the top of this page. It will tell you what pages link to it (you can also click the blue link in the table, by the way):

Google Webmaster Tools: Links to a specific page

Google Webmaster Tools: Links to a specific page

 

Somewhere on that site, there seems to be a remainder of an old site (current pages don’t have that .htm extension). The page at hand actually returns a 404 page, unfortunately, so this is something that should be looked into. Another reason to make sure to check Google Webmaster Tools on a regular basis.

Manual Actions

Let’s hope this section is totally empty for you. The best message here is “No manual webspam actions found.”. Google uses this section in Google Webmaster Tools to tell you what ‘penalties’ your website has received from the friendly people at Google. This isn’t a bad thing. In our ongoing quest for better websites (did I mention our site reviews in this post already?), the quality of a website is very important. The goal of a website should always be to serve the visitor the best information or products in the best possible way, and preferably Google should be one of these visitors. The visitor comes first. If you find a manual webspam action here, Google found something that doesn’t serve your visitor. There are a number of flavors:

  • Unnatural links
    Make sure from and to your site are valuable, not just for SEO. Preferably your links come from and link to related content that is valuable for your readers. Another unnatural link is a link that is from a detected link network.
  • Hacked
    A message stating your site’s probably hacked by a third party. Google might label your site as compromised or lower your rankings.
  • Hidden redirects
    Links to for instance affiliate sites that you have hidden using a redirect (f.i. cloaking) Cloaking and ‘sneaky’ redirects are a violation of Google’s Webmaster Guidelines.
  • Thin content
    If your website is flooded by low quality content, for instance duplicate content or pages with little to no text, Google will value your website lower.
  • Hidden text
    Back in the days this worked very well: white text on a white background, stuffed with keywords you wanted to rank for. Well, Google got smarter and will find that text. Again: write for your visitor, not for Google.
  • Plain Spam
    Again, you’re not following Google’s guidelines. Automatically generated content, scraped content and aggressive cloaking could be the reason Google considers your website pure spam.
  • Spammy freehosts
    If the majority of the sites that are on the same server as yours are considered spammy, the entire server might be blacklisted. And yes, your site might non-intentionally suffer from this as well. Make sure to choose the right hosting company.
  • Spammy structured markup
    If you use rich snippets for too many irrelevant elements on a page, or mark up content that is hidden to the visitor, that might be considered spammy. Mark up what’s necessary, and not everything is necessary.

All these things are unnatural optimization or a sign of low quality. Luckily Google provides information on recommended actions via Webmaster Tools. However, these might be lengthy processes and take some hard work from your site. But hey, you were impatient and wanted that quick win.

In conclusion: prevent any messages from turning up in this Manual Actions section.

International Targeting

If you are running an international company, chances are your website is available in multiple languages. Although there is more than one way to this, the best way is to set up different sites per top-level domain and link these websites using a hreflang tag. Alternatives would be telling Google this via a sitemap or your HTTP headers. In this section, Google Webmaster Tools tells you if the implementation is correct.

Besides that, and often overlooked, you can actually select a geographical target audience here, on the second tab Country:

Google Webmaster Tools: International Targeting

Google Webmaster Tools: International Targeting

 

If your business solely focuses on one country, why not tell Google that, right?

Mobile Usability

I could go on and on about Mobile Usability / User Experience. Mobile traffic is really important for a lot of websites and Google does a nice job emphasizing that to us webmasters. In Google PageSpeed Insights, there is a mobile UX section, as there is in Google Webmaster Tools:

Google Webmaster Tools: Mobile Usability

Google Webmaster Tools: Mobile Usability

 

Webmaster Tools won’t only tell you what is wrong, but also on how many and which pages these errors occur (just click the double arrow on the right next to the error).

In my opinion, most mobile errors that are highlighted here can be fixed with just a tad bit of knowledge of CSS.

Want to know more about Google Webmaster Tools?

As mentioned, we’ll be going over Webmaster Tools in a series of articles. This is only article number two, so please stay tuned and visit our website frequently to find out all there is to know.

If you have any additions to this section of Google Webmaster Tools, like alternative ways to look at this data, or combine it with other reports in Google Webmaster Tools or Google Analytics, we are very much looking forward to your thoughts. Any insightful additions might be used in future posts, of course! Thanks in advance.

This post first appeared as Google Webmaster Tools: Search Traffic on Yoast. Whoopity Doo!

Google Search Console: Search Appearance

There are a lot of ways to check how your website’s doing these days. The most common one people use is probably Google Analytics. Google Analytics is definitely a great tool for monitoring your site. However, since the ‘not provided’ development, it’s become pretty hard to monitor your SEO efforts. And unfortunately, most tools that can monitor your SEO efforts come at a costly price. Today I’ll be highlighting one of the free tools; Google Search Console.

This is actually the first post in a series on Google Search Console. We’ll be going over every major menu item in Google Search Console, starting with Search Appearance.

What is Google Search Console?

Before going into Google Search Console, you might be wondering, what is it in the first place? Google themselves explain it the following way in their meta description of Google Search Console:

“Google Search Console provides you with detailed reports about your pages’ visibility on Google.”

This is definitely true, but it’s leaving out quite a lot of other things. Google Search Console looks at a lot more than ‘just’ your pages’ visibility on Google. It looks at everything that’s causing that visibility, such as backlinks, crawling (errors), robots.txt, sitemaps, etc. And on top of that, Google Search Console actually still shows you quite some search query data.

You can find your own Google Search Console by logging into your Google account here. And if you haven’t set up your GSC yet, you can follow the steps here.

On the 20th of May 2015, Google announced that the name Google Webmaster Tools did not cover the user base of the tool anymore. Only a part of the user base could indeed be called ‘webmaster’. For that reason, Google renamed the tool Google Search Console (GSC)

Other posts in this series

Search Appearance

search_console_-_dashboard_-_https___yoast_com_-1

The Search Appearance menu item gives you a lot of insight into just that: what your website appears like in the search results. You can actually click the ‘i’ for more information on the search appearance:

Google Webmaster Tools - Sitelinks

You can select every part of a search result to get more information on that specific part and how to influence how it looks.

Optimize your site to the max: get all our SEO plugins and extensions at once! Get our Yoast Complete SEO bundle and save money! »

Yoast Complete SEO packBuy now » Info

Structured Data

Under Structured Data you’ll find a number of all the pages that have some kind of structured data attached to them, such as schema.org or RDFa. Structured data means you give certain elements on a page a sort of label, such as ‘Product‘.  This will make it clear to the big search engines (Google, Bing, Yahoo) that there’s a product on this page. On top of that, you can add things such as ratings or prices of your product that might also show up in the search results. We recommend to add schema.org data using JSON-LD.

Google Search Console: Search Appearance - structured data

If any pages on your site don’t have the structured data set up right, Google Search Console will give you a red line named “Items with Errors”. GSC automatically sorts by the number of “Items with Errors”, so the most important faults will be on top. To view what specific pages have these errors, just click one of the lines in the table. This will take you to a list of all the specific pages that have errors with the Data Type you selected. You’ll probably be able to create a nice list of to-do’s for your site, just based on these URLs.

Rich Cards

Sometimes Google tries to answer the user’s question right in the search result pages. It does that by presenting the user with so-called Rich Card. That could be a recipe, restaurant listing with a rating, or even a product result that has just that bit of extra information on availability or pricing. These are just examples.

If your website is set up the right way, it’s using structured data to set up these rich cards. In Google Search Console, under Search Appearance, you’ll find any and all errors Google has found in the data you provided for this. That is if Google has detected any rich card structured data on your site. These errors are divided into three levels:

  1. The top level lists a sum of errors or recommendations. These are conveniently grouped by card type and you can click a row for more details.
  2. second level in the report gives you a list of all the critical (errors in required fields) and non-critical errors for a selected card type. Again, you will find more details after clicking a row.
    There are three kinds of statuses here: Invalid (critical, fix now), Enhanceable (nice to fix) and Fully-Enhanced (job well done).
  3. The third level allows you to view all pages with cards of a selected type affected by the selected rule. After clicking a row, you’ll find a suggested fix.

Data Highlighter

The Data Highlighter actually makes fixing the issues you’ve found in the Structured Data section a lot easier. For instance, choose one of the URLs that had a faulty Structured Data setup and tell GSC what kind of information you want to highlight:

Google Webmaster Tools Data Highlighter

This will bring you to a live view of that page and you’ll be able to select any element on the page. By selecting an element you’ll be given a choice of what you want to highlight that specific element for. For example, for an Article, you’ll be given these markups to add to the corresponding element on the page:

Data Highlighter Tagger

This makes adding Structured Data, for Google at least, really as easy as a few clicks.

HTML Improvements

This page is really straight forward. This basically checks all your website’s meta descriptions, title tags, and content that wasn’t indexable. If Google Search Console finds meta descriptions that are too long, too short or duplicate, it will show a number of pages higher than 0, and the link will become clickable:

The same goes for missing, duplicate, too long, too short or non-informative title tags and for any content that GSC thought was non-indexable. Clicking the linked word will take you to a list of meta descriptions or page titles that are faulty. You’ll be able to find on which pages exactly this is happening. Some more to-do’s to add to that list! If you find writing decent meta descriptions hard, read this post to learn how!

Accelerated Mobile Pages

Accelerated Mobile Pages, or AMP, is a way to make your pages easier accessible on mobile devices. Note that for AMP to work properly, you need to create matching, valid AMP pages with the right schema.org markup. And you need to make sure these AMP pages are properly linked. We have written a number of articles on the subject:

Go read these. While it might seem like you need to set up a second website, there are obviously tools that will help you keep up with the possibilities and future development of AMP.

Search Console: accelerated mobile pages

In Google Search Console you will find a debug report for your AMP pages. Google set up this report as the first layer of information about your AMP pages: there is more to come in this report. The current report provides a quick overview of your AMP errors, which you can analyze per specific error type and URL. It will help you find the most common AMP issues on your website, so you can fix these.

Optimize your search appearance!

So you see there’s a lot you can do about what your search results in Google look like and a lot to optimize to make it more clear for Google. Optimizing your search appearance might only have a minor impact on your ranking, but it will definitely increase the click-through rate from Google. And that’s worth a little effort!

What do you think? Do you have experience using Google Search Console like this? Or do you have some additional tips? Let us know in the comments!

Read on: ‘Google Search Console: crawl ’ »

Website maintenance: Check and fix 404 error pages

If your website is important to your business, it’s essential to schedule time to keep it running smoothly. Therefore we regularly write about the things you should do to keep your site in shape. In this post, we’ll write about the most basic of all: checking for 404 errors.

Note: this post does not cover the required elements of a good 404 page, we do have a post on that, though: Thoughts on 404 error pages.

404 errors and broken links

One of the most annoying things that can happen to a visitor is to hit a 404 “page not found” error on your website. Search engine spiders tend to not like such errors much either. Annoyingly, search engines often encounter other types of 404s than your visitors, which is why the first section of this post is split in two:

1. Measuring visitor 404 error pages

If you use the MonsterInsights plugin, it’ll automatically tag your 404 pages for you. So then, if you go into your Google Analytics account and go to Behavior → Site Content → Content Drilldown and search for 404.html, you’ll find a ton of info about your 404s (click for larger version):

Google Analytics report showing 404 error pages

You’ll see URLs like this:

/404.html?page=/wordpress/plugin/local-seo/&from=https://yoast.com/articles/wordpress-seo/

This tells you two things:

  • The 404 URL was /wordpress/plugin/local-seo/ (it lacks an s after plugin)
  • It was linked to from our WordPress SEO article.

Using this info, you can fix the 404 and go into the article and fix the link.

As you can see from the above screenshot, we actually get 404s too. We break things all the time because our website is a constant work in progress! Making sure that you notice it when you’re breaking things is a good way of not looking stupid for too long though.

2. Measuring bot 404 error pages

Next to 404s for visitors, search engines will also encounter 404s on your site that can be quite different. You can find the 404s that search engine spiders encounter by logging into their respective Webmaster Tools programs. There are three webmaster tools programs that can give you indexation reports, in which they tell you which 404s they encountered:

  1. Bing Webmaster Tools under Reports & Data → Crawl Information
  2. Google Search Console under Coverage → Errors
  3. Yandex Webmaster under Indexing → Excluded Pages → HTTP Status: Not Found (404)

One of the weird things you’ll find if you’re looking into those Webmaster Tools programs is that search engine spiders can encounter 404s that normal users would never get to. This is because a search spider will crawl just about anything on most sites, so even links that are hidden will be followed.

If you’re serious about website maintenance, you might want to find these 404s before search engines encounter them. In that case, spidering your site with a tool like Screaming Frog will give you a lot of insight. These tools are built specifically to behave just like search engine spiders and will, therefore, help you find a lot of issues.

Fixing 404 errors

Now that we’ve found all these 404 errors, it’s time to fix them. If you know what caused the 404 and you can fix the link that caused it, it’s best to do that. This will be the best indication of the quality of your site for both users and search engines.

As search engines will continue to hit those URLs for quite a while, it actually makes sense to still redirect those faulty URLs to the right pages as well. To create those redirects, there are several things you can do:

  • Create them manually in your .htaccess or your NGINX server config
    While this is not for the faint of heart, it’s often one of the fastest methods available if you have the know-how and the access to do it.
  • Create them with a redirect plugin
    There are several redirect plugins on the market, the most well-known one being Redirection. This is a lot easier but has the disadvantage of being a lot slower as to do the redirect, the entire WordPress install has to load first. This usually adds half a second to a second to the load time for that particular redirect.
  • Create them with our Yoast SEO Premium plugin
    Our Yoast SEO Premium plugin has a redirect module that allows you to make redirects with the ease of the WordPress interface, but also allows you to save those to your .htaccess file or a NGINX include file, so they get executed with the speed of the first option above. It actually also has another few nifty options: you can get the 404 errors from Google Search Console straight in your WordPress install and redirect them straight away, and it’ll add a nice button in your WordPress toolbar if you’re on a 404 page:

Check for image / embed errors

If you’d look at your server logs, you’d get 404 errors of a different type too: 404s for broken images or broken video embeds. You might also have errors that don’t show up in your logs, like broken YouTube video embeds. They don’t cause the entire page not to work, but they do look sloppy. These types of errors are harder to find because webmaster tools programs don’t report them as reliably and you can’t track them with something like Google Analytics either.

The easiest method to find these broken images and embeds is using one of the aforementioned spiders. Screaming Frog, in particular, is very good at finding broken images. Another method is to check your server logs and go through them searching for a combination of 404 and “.jpg” and “.png”.

How often should you check for 404 errors?

You should be checking your 404s at least once every month and on a bigger site, every week. It doesn’t really depend on how much visitors you have but much more on how much content you have and create and how much can go wrong because of that. The first time you start looking into and trying to fix your 404 error pages you might find out that there are a lot of them and it can take quite a bit of time… Try to make it a habit so you’ll at least find the important ones quickly.

Read more: Content maintenance for SEO: research, merge and redirect »

The post Website maintenance: Check and fix 404 error pages appeared first on Yoast.