Content analysis: methodological choices explained

In the 3.3 update of Yoast SEO, we’ve added six content checks which will help with the readability of your text. This post is meant to explain the choices we made while developing all these readability checks. We provide argumentations for our choices. This is indispensable if you want to understand how our content analysis works and why we made the assessment the way it is.

Keep reading: ‘The importance of quality content’ »

Little research on what is ‘right’ in writing!

There are a lot of things we know about readability. Short sentences are easier to read than long sentences. Passive voice causes distant writing. However, linguistics is not at all an exact science. It is hard to decide when a piece of text is readable and when it is not. Still, we had to decide these things in order to make readability checks.

We hired a linguist (Irene Strikkers) to help us. She figured out which checks were useful and why. In this article, we first explain how the development of these readability checks came about. After that, we explain why each of the 6 assessments is important and describe the exact assessment.

The process of developing these assessments was thorough and we really worked with experts in the field of SEO copywriting and linguistics. That being said, we could well imagine that some of you have questions about the way we measure things or disagree with the operationalization of one of the assessments. Perhaps you would like to see other checks added to Yoast SEO. We’ve made a special form that’ll allow you to give us your feedback. We are curious to your ideas AND your argumentation!

Our strategy

Why readability?

We started out by doing some serious research. Irene (our linguist) began checking the existing tools, plugins, and apps in the field of automatic text review. She wanted to comprehend what was already on the market.
At the end of her research, Irene had reviewed 24 different tools and apps. She found out that some of them didn’t check what they were supposed to check, while others were seriously slowing down a website. That already gave us lots of clues about what not to do.

Most important finding of Irene’s research was that a lot of apps and tools focused on spelling and grammar, but neglected readability. Readability actually is of great importance! It determines whether people understand the message of your text. Although correct spelling and grammar are important, these are most certainly not the only factors influencing the readers. A readable text is a text one reads all the way through. Irene’s research clearly showed a lack of readability checks among most of the existing writing tools.

Deciding upon measurements

In the first stage of developing the assessments, we analyzed the competition.We decided upon the measurements of our assessments through thorough investigation of other tools and checks. Which measurements are commonly used in grammar and spelling checks? How did these other tools measure similar readability assessments?

Next to that, we analyzed the readability of texts we considered to be very readable and on texts we considered to be a very bad read. In this phase, we strongly leant on the expertise of our linguist. Why are certain checks important? What are the theoretical reasons for making an assessment? In this phase we also, calculated readability scores of texts we considered well-written as well as readability scores of texts we considered badly written. We used all the information of the two phases in developing our content checks to come to the first version of our assessments.

Finetuning measurements with some research

In the final phase of developing the assessment, we used our own content analysis. In this phase, we put our content checks to the test. We analyzed texts of lots of news sites and blogs using our own content analysis tool. We checked out texts of the Guardian, of Moz, but also articles of low-key mum and travel blogs. We selected a total of 75 articles from very different blogs and news sites.

The purpose of the analysis of these 75 different articles was to make sure that the assessments in the content analysis are sufficiently distinctive. If the bullet of the content analyse was to be green all of the time, that wouldn’t help our users. But, if it would be almost impossible to get a green bullet, that would lead to much frustration as well. Our research gave us a clear overview of the readability of many different blog post.

Our research initially showed that more than 40 % of the text scored an overall red bullet on the content analysis. It turned out that very few articles had enough transition words. We decided to lower the demands on transition words a bit, as most articles weren’t able to meet our demands. In the end, about 35% of the articles scored a red bullet, 30% scored an orange bullet and 35% scored a green bullet.

After fine tuning the assessments in this last phase, we made the final measurements of all the content checks. We feel our instrument is useful and distinctive enough. However, if you feel otherwise, please let us know!

Read on: ‘SEO copywriting: the Ultimate guide’ »

Measurements

In the remainder of this article, we will discuss the different assessments. We’ll first explain the importance of an assessment and then describe the exact measurement of each assessment. Finally, we’ll discuss the measurement of the overall content score. For now, the content-analysis will only available in English. Of course, we are already working on making it available in other languages.

Subheadings

Most readers are lazy and quickly bored. You want to convince them to read your text in a matter of seconds. Before deciding to read your text, readers tend to scan your text. 

Research has shown that people generally scan a text in an F-shaped pattern. As a writer, you can guide your readers by providing them with clear subheadings. Good subheadings will not only give them a quick overview of the topics discussed, they also make the structure of your text clearly visible. Moreover, if readers decide to read your text, they’ll already know what your paragraphs will be about. This will make understanding the content much easier.

Subheadings should be equally distributed throughout your text. You should try to cover a topic in the text after each subheading. Little subheadings throughout your text could mean that you did not cover all your topics with a subheading. That’ll make the structure of your text less visible to your reader. Too many subheadings will make the text messy and cluttered. Too many subheadings will not add structure at all.

Measurement subheadings

The measurement of subheading distribution assesses the length of the text after a subheading. If you article contains text less than 300 words after you’ve placed a subheading, you will receive a green bullet. Text subsequent to a subheading containing more than 300 and less than 350 words will result in an orange bullet. Articles with texts subsequent to a subheading containing more than 350 words will score a red bullet.

If the article does not contain any subheadings, you will score a red bullet.

Paragraphs

Readers like bite-sized pieces of information. Long paragraphs are scary and discourage people from reading. You should therefore make sure that paragraphs remain rather short.

Measurement paragraph length

Texts with only paragraphs containing less than 150 words will score a green bullet. Texts containing paragraphs with more than 150 words and less than 200 will score an orange bullet. Texts containing paragraphs of more than 200 words will receive a red bullet.

Sentences

Your sentences should not be too long either. The longer your sentences are, the harder they are to process, because readers have to keep all the words and relationships in their working memory. Therefore, try to write no sentences longer than 20 words.

Measurement sentence length

A text in which 25% of the sentences contain more than 20 words will get a green bullet. A text in which more than 25% and less than 30% of the sentences contain more than 20 words will get an orange bullet. Texts in which more than 30% of the sentences contain 20 words will score a red bullet.

Transition words

Using transition words is like putting cement between your sentences. The relation between two sentences becomes apparent by the use of transition words. Readers will understand your content much better if you make proper use of these kinds of words.

With transition words, you indicate relationships both between paragraphs as well as within paragraphs. They indicate whether a conclusion is coming up, or maybe a comparison or an enumeration. When readers know what to expect next, they’ll be able to process your text more easily.

Measurement transition words

If at least 30% of the sentences in your text contain a transition word, the bullet will be green. If more than 20% of your sentences and less than 30% of your sentences contain a transition word, your bullet will be orange. The bullet will be red if less than 2o% of the sentences of your text contain a transition word. That’s less than 1 in 5 sentences.

Passive voice

Passive voice occurs if the noun or noun phrase that would be the object of an active sentence (such as Yoast SEO calculates your SEO score) appears as the subject of a sentence with passive voice (The SEO score is calculated by Yoast SEO).

Passive voice results in distant writing. Active voice is much more engaging. We’d like to discourage you from using passive voice altogether. However, some sentences just get really awkward when written in the active voice. That’s why we’ve set the recommended maximum percentage of passive sentences to 10%.

Read more: ‘The passive voice’ »

Measurement passive voice

If less than 10% of the sentences of your text is in passive voice the bullet will be green. You’ll score an orange bullet if your text contains between 10 and 15% sentences in passive voice. If more than 15% of the sentences of your text is in passive voice, you’ll score a red bullet.

Flesch Reading Ease

Flesch Reading Ease measures textual difficulty of a reading passage in English (note: in languages other than English, Flesch is seriously unreliable). The lower the score, the more difficult the text is. The Flesch readability score uses the sentence length (number of words per sentence) and the number of syllables per word in an equation to calculate the reading ease.

Measurement Flesch Reading Ease

If the Flesch Reading Ease score of your text is higher than 60, the bullet will be green. If the Flesch Reading ease is between 50 and 60, the bullet wil turn orange. Your article will receive a red bullet if the Flesch Reading ease score is lower than 50.

Keep reading: ‘Flesch reading ease: use it!’ »

Measurement of overall content score

The present content analysis of Yoast SEO contains 6 different content checks. Of course, we are already developing some new ones ;-). These 6 checks all are equally important in the calculation of the overall content score. A red bullet equals 3 penalty points, while an orange bullet equals 2 penalty-points. If your article scores 7 or more penalty points, the overall content bullet will be red. If your article scores 5 or 6 penalty points, your article will receive an orange bullet. Articles with 0, 2, 3 or 4 penalty points will be rewarded with the much-wanted green bullet.

In order to score an overall green content bullet, you are allowed to have one red bullet or two orange bullets.