How Google Assesses Your Content: An In-Depth Data Analysis

  • June 21, 2024
  • SEO
No Comments

The latest Helpful Content Update (HCU) concluded with the Google March core update, which completed its rollout on April 19, 2024. These updates have seamlessly integrated the helpful content system into Google’s core algorithm.

To scrutinize the changes in Google’s ranking of webpages, data scientists from WLDM and ClickStream collaborated with Surfer SEO. Surfer SEO pulled data based on specific keyword lists.

Implications Of The March Update And Google’s Goals

Google is shifting its focus towards content that offers substantial value to humans, not search engine algorithms.

Therefore, the update places a premium on topic authority. Creators should demonstrate comprehensive experience, expertise, authoritativeness, and trustworthiness (E-E-A-T) within their web pages to aid users effectively.

Especially important are Your Money or Your Life (YMYL) pages, where accurate information is paramount due to the potential risks to users’ health or financial well-being.

Google’s Search Liaison, Danny Sullivan, has confirmed that the HCU functions on a page level rather than solely on a site-wide basis.

According to Google:

“This [HCU] update involves refining some of our core ranking systems to help us better understand if webpages are unhelpful, have a poor user experience, or feel like they were created for search engines instead of people. This could include sites created primarily to match very specific search queries.

We believe these updates will reduce the amount of low-quality content on Search and send more traffic to helpful and high-quality sites.”

Additionally, the March 2024 spam update was finalized on March 20, compounding these changes.

SEO Industry Impact

The update significantly impacted numerous websites, altering search rankings and causing considerable fluctuation during the rollout. Some SEO professionals described it as a “seismic shift” in the SEO industry.

Frustratingly, over the past few weeks, Google contradicted its own guidelines and algorithms at the heart of the HCU system by releasing AI-generated search results that included dangerous and incorrect health-related information.

Even now, the Search Engine Results Pages (SERP) show continued volatility. It appears ongoing adjustments to the March update are still occurring.

Background

Methodology

In December 2023, we analyzed the top 30 results on Google SERPs for 12,300 keywords. In April 2024, we expanded our study by examining 428,436 keywords, analyzing search results for 8,460 of them. This year’s study covered 253,800 final SERP results.

The 2023 keyword set served as a baseline for the expanded study in 2024, allowing us to understand Google’s ranking signal changes after March and some “rank tremors” that occurred in early April.

For both data sets, we appended “how to use” to the front of keywords to create informational intent keywords. JungleScout provided access to an ecommerce keyword database, grouped and siloed using Natural Language Processing (NLP). Our study focused on specific product niches.

Correlation And Measurements

We used the Spearman correlation to measure the strength and direction of associations between ranked variables.

In SEO ranking studies, a .05 correlation is considered significant. With hundreds of ranking signals, each one slightly impacts the ranking.

Our Focus Is On-Page Ranking Factors

Our study primarily analyzed on-page ranking signals. By chance, our 2024 study coincided with the end of Google’s most significant ranking changes in over eight years. Data studies require extensive planning, including resource allocation.

Our key metric for the study was comprehensive content coverage, which means thorough or holistic writing about the primary topic or keyword on a page. Each keyword was matched to text on the pages of the top 30 URLs in the SERP. We had highly precise measurements for scoring natural language processing-related topics used on pages.

Another key study goal was understanding whether webpages covering health-sensitive topics differed from non-health pages. Would pages not falling into the now-infamous YMYL category be less sensitive to some ranking factors?

Since Google emphasizes excellent user experience, we pulled data on each webpage’s speed and Core Web Vitals in real time to see if Google considers it a key component of user experience.

Content Score As A Predictor

It’s unsurprising that Surfer SEO’s proprietary “Content Score” turned out to be the best predictor of high rankings compared to any single on-page factor we examined in our study. This held true for both 2023, where the correlation was .18, and 2024, which showed .21.

The score is an amalgamation of many ranking factors, clearly demonstrating that the scoring system reflects meaningful, helpful content for users. The minor correlation change from the two periods indicates the March update did not significantly alter key on-page signals.

The Content Score consists of many factors, including:

  1. Usage of relevant words and phrases.
  2. Title and H1 tags.
  3. Headers and paragraph structure.
  4. Content length.
  5. Image occurrences.
  6. Hidden content (e.g., alt text of the images).
  7. Main and partial keywords—both their frequency and placement.

… and several other sound SEO practices.

More About Correlations And Measurements In The Study

We selected niches because we wanted domains with multiple URLs to appear in our study. It was crucial to gather many niche and “specialty-oriented” sites, as is the case for most non-mega sites.

Most data studies overlook how a group of URLs from one domain tells a story. The keywords they use are so randomized that mega websites dominate the results.

The narrow topic focus also meant fewer keywords with extreme ranking competition. Many ranking studies predominantly use keywords with over 40,000 monthly searches, but most SEO professionals work for websites that don’t rank in the top 10 for those. This study is biased toward less competitive keywords, disregarding Google keyword search volume, focusing instead on Amazon’s volume.

Our keywords had more than 10 monthly searches on Amazon per month (via JungleScout). However, appending “how to use” to the front of the keyword often resulted in less than 10 monthly searches on Google.

The “dangerous, prohibited, banned” group was excluded from most comparisons of health vs. non-health. Many were esoteric topics, requiring many words to describe them on Amazon.

Most SEO professionals do not work for the top 50 largest websites. We aim to provide results beneficial for the majority of SEO pros.

Here’s How We Generated Different Keyword Types

For instance, we appended “buy” to the product keyword “Adobe Professional” in one instance and “how to use” in another.

Product Category Search Intent Appended Keyword
Adobe Professional Software Informational How to use How to use Adobe Professional

We examined data using the Spearman rank-order correlation formula, which measures the correlation between two variables, ranging from -1 to 1. A correlation coefficient of 1 or -1 indicates a strong monotonic relationship between the two variables.

We chose Spearman’s correlation over Pearson’s because Google search results are ranked by decreasing importance.

Spearman’s correlation compares the ranks of two datasets, aligning better with our goal than Pearson’s. We used .05 as our level of correlation confidence.

A correlation of .08 indicates a ranking signal twice as strong as a .04 signal. Greater than .05 is a positive correlation; less than .05 shows no correlation. Correlations range from .05 to -.05, with negative correlations showing a direct decrease in the variable.

Many domains in the study are from outlier or niche topics and are small because of minimal investment in time and money. That is primarily why they don’t rank well.

We looked for “controls” to show that two domains with equal time, web development, and financial investment, for example, could still rank differently based on health vs. non-health topics.

While correlation does not imply causation, understanding controls and independent variables was crucial. Google uses thousands of factors, making isolation difficult. Nonetheless, correlations have been trusted in science for centuries.

Keyword Categories And Classifications

Our keywords were search terms related to products.

Focusing on narrow niches allowed us to separate very non-YMYL topics from those that were YMYL.

Image from author, June 2024

CBD and vape keywords, banned from Google Ads, were ideal for our health-related keyword set. The FDA considers muscle building and weight loss among the riskiest categories on Amazon.

We selected non-health categories as they were innocuous niches.

The “dangerous, prohibited, banned” keywords included products manually removed from Amazon’s Seller Central page list.

Each category fits into one of three classifications.

Image from author, June 2024

Detailed Findings And Actionable Insights

Importance Of Topic Authority And Semantic SEO

The largest on-page ranking factor is the usage of topics related to the searched keyword phrase, indicative of topic authority and semantic SEO.

We found a correlation of -.11 in December 2023, which rose to -.13 in April 2024 for “missing common keywords and phrases.” A higher negative correlation, like -.

About BDM

We are a digital marketing firm dedicated to assisting our clients in achieving outstanding outcomes in various crucial sectors.

Request a free quote

We provide expert digital services designed to significantly improve websites' organic search rankings, enabling them to compete effectively for top positions, even with highly competitive keywords.

Subscribe to our newsletter!

More from our blog

See all posts