Utilizing Python to Test SEO Theories: A Comprehensive Guide (And Why It’s Essential)

  • July 8, 2024
  • SEO
No Comments

When dealing with websites that experience significant traffic, implementing SEO recommendations can result in substantial gains or devastating losses. To minimize the potential risks of SEO implementations going awry, one effective strategy is to pre-test search engine rank factors using machine learning models.

In addition to pre-testing, performing split tests is the most reliable approach to validate SEO theories before deciding whether to implement them sitewide.

This guide will outline the steps necessary to use Python to test your SEO theories effectively.

Choose Rank Positions

One main challenge in testing SEO theories is the requirement for large sample sizes to ensure statistically valid test conclusions.

Split tests, popularized by Will Critchlow of SearchPilot, often focus on traffic-based metrics such as clicks. This works well for large enterprises with ample traffic, but smaller websites may find traffic-based metrics to be too infrequent, resulting in lengthy experiments.

For small to mid-sized companies, considering rank positions can be more practical. Often, these websites rank for target keywords but do not yet achieve high traffic volumes.

Over the testing period, multiple rank position data points for various keywords can be collected daily, weekly, or monthly. This method tends to require a shorter time frame to reach the minimum sample size compared to traffic metrics, making it ideal for non-enterprise clients.

Google Search Console Is Your Friend

For rank positions, Google Search Console (GSC) is a convenient and cost-effective data source, assuming it is properly set up. GSC provides an API for extracting data points over time and filtering for specific URL strings.

While GSC data may not be perfect, it is consistent, which is sufficient for this purpose.

Filling In Missing Data

GSC reports data only for URLs with pages, so you will need to create rows for missing dates and fill in the data. Use Python functions like merge() (similar to VLOOKUP in Excel) to add missing data rows per URL and input the desired data.

For traffic metrics, the value will be zero, while for rank positions, it could be the median (assuming the URL was ranking) or 100 (assuming it wasn’t ranking). You can find the necessary code here.

Check the Distribution and Select Model

Understanding the distribution of your data is crucial, as it dictates the appropriate model for evaluating your SEO theory. Using Python, you can visualize and analyze the data’s distribution:

ab_dist_box_plt = (
    ggplot(ab_expanded.loc[ab_expanded['position'].between(1, 90)], 
    aes(x = 'position')) + 
    geom_histogram(alpha = 0.9, bins = 30, fill = "#b5de2b") +
    geom_vline(xintercept=ab_expanded['position'].median(), color="red", alpha = 0.8, size=2) +
    labs(y = '# Frequency n', x = 'nGoogle Position') + 
    scale_y_continuous(labels=lambda x: ['{:,.0f}'.format(label) for label in x]) + 
    theme_light() + 
    theme(legend_position = 'bottom', 
    axis_text_y =element_text(rotation=0, hjust=1, size = 12), 
    legend_title = element_blank()

Image from author, July 2024

The above chart shows a positively skewed distribution, indicating that most keywords rank in higher positions (left of the red median line). To run this code, ensure the required libraries are installed using pip install pandas plotnine.

Now, you can determine the suitable test statistic to evaluate your SEO hypothesis. Based on the distribution, select an appropriate model.

Minimum Sample Size

The chosen model helps determine the minimum sample size required to ensure that any observed differences between groups are statistically significant and not due to random chance. This involves simulating multiple random distributions matching the observed pattern for both test and control groups.

The detailed code can be found here. Running the code yields the following:

(0.0, 0.05) 0
(9.667, 1.0) 10000
(17.0, 1.0) 20000
(23.0, 1.0) 30000
(28.333, 1.0) 40000
(38.0, 1.0) 50000
(39.333, 1.0) 60000
(41.667, 1.0) 70000
(54.333, 1.0) 80000
(51.333, 1.0) 90000
(59.667, 1.0) 100000
(63.0, 1.0) 110000
(68.333, 1.0) 120000
(72.333, 1.0) 130000
(76.333, 1.0) 140000
(79.667, 1.0) 150000
(81.667, 1.0) 160000
(82.667, 1.0) 170000
(85.333, 1.0) 180000
(91.0, 1.0) 190000
(88.667, 1.0) 200000
(90.0, 1.0) 210000
(90.0, 1.0) 220000
(92.0, 1.0) 230000

To interpret these results:

  • (39.333,: The proportion of experiments in which statistical significance is achieved.
  • 1.0): Statistical power, or the probability that the test correctly rejects the null hypothesis.
  • 60000: Sample size.

Experience indicates that you should aim for a sample size with a high likelihood of achieving statistical significance—typically, 220,000 data points are needed.

Failing to account for minimum sample size can lead to premature conclusions, wasted resources, and damaged credibility.

Assign and Implement

With minimum sample size determined, start assigning URLs between test and control groups to evaluate your SEO theory. Using Python’s np.where() function, partition your subjects based on criteria like URL pattern, content type, or keywords in title, depending on the SEO theory under test.

Use the code available here. This approach can be applied prospectively or retrospectively, provided no other changes affect the test’s validity.


Once data is collected, you can run the test. For rank position, a model like the Mann-Whitney test is suitable due to its distributive properties. If another metric is used (e.g., clicks), a different statistical model may be required.

Code to run the test is provided here. Upon running the test, you can print the results:

Mann-Whitney U Test Results
MWU Statistic: 6870.0
P-Value: 0.013576443923420183
Additional Summary Statistics:
Test Group: n=122, mean=5.87, std=2.37
Control Group: n=3340, mean=22.58, std=20.59

The results above indicate a significant difference of 17 positions in Google ranking between test and control groups, validating the SEO theory. However, collecting additional data may be necessary to confirm the results’ robustness.

Split Testing Demonstrates Skills, Knowledge, and Experience

This article has outlined the process of SEO hypothesis testing, highlighting the considerations and data requirements for valid SEO tests. Mastering this approach can help demonstrate your SEO skills, knowledge, and experience effectively.

For an in-depth exploration, including more advanced topics like split A/A and split A/B testing, consider my Data Science for SEO video course.

SEO professionals often take certain knowledge for granted, but clients may challenge this knowledge. Split test methods are invaluable for showcasing your SEO skills convincingly.

More resources:

Featured Image: UnderhilStudio/Shutterstock

About BDM

We are a digital marketing firm dedicated to assisting our clients in achieving outstanding outcomes in various crucial sectors.

Request a free quote

We provide expert digital services designed to significantly improve websites' organic search rankings, enabling them to compete effectively for top positions, even with highly competitive keywords.

Subscribe to our newsletter!

More from our blog

See all posts