Google Advises Websites to Utilize Robots.txt for Blocking Action URLs

  • June 11, 2024
  • SEO
No Comments

In a LinkedIn post, Gary Illyes, an Analyst at Google, has reiterated longstanding advice for website owners: Utilize the robots.txt file to restrict web crawlers from accessing URLs that prompt actions such as adding items to carts or wishlists.

Illyes underscored a frequent issue where unnecessary crawler traffic overwhelms servers, primarily when search engine bots navigate URLs designed for user-initiated actions.

He stated:

“Looking at what we’re crawling from the sites in the complaints, way too often it’s action URLs such as ‘add to cart’ and ‘add to wishlist.’ These are useless for crawlers, and you likely don’t want them crawled.”

To prevent this unnecessary server load, Illyes recommended blocking access via the robots.txt file for URLs containing parameters like “?add_to_cart” or “?add_to_wishlist.”

He provided an example:

“If you have URLs like:

You should probably add a disallow rule for them in your robots.txt file.”

Although employing the HTTP POST method can also inhibit the crawling of such URLs, Illyes pointed out that crawlers can still make POST requests. Therefore, the use of robots.txt is still advisable.

Reinforcing Decades-Old Best Practices

Alan Perkins, who participated in the discussion, highlighted that this advice echoes web standards established in the 1990s for similar reasons.

Referencing a 1993 document titled “A Standard for Robot Exclusion”:

“In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren’t welcome for various reasons…robots traversed parts of WWW servers that weren’t suitable, e.g., very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).”

The robots.txt standard, which proposes rules to restrict well-behaved crawler access, emerged as a “consensus” solution among web stakeholders back in 1994.

Obedience & Exceptions

Illyes affirmed that Google’s crawlers fully adhere to robots.txt rules, with rare exceptions comprehensively documented for scenarios involving “user-triggered or contractual fetches.”

This adherence to the robots.txt protocol has been a cornerstone of Google’s web crawling policies.

Why This Matters

Although the advice might appear basic, the renewed emphasis on this decades-old best practice underscores its ongoing significance.

By leveraging the robots.txt standard, websites can prevent overzealous crawlers from consuming bandwidth with unproductive requests.

Impact on Your Website

Whether you manage a small blog or a large e-commerce platform, following Google’s advice to use robots.txt for blocking crawler access to action URLs can provide several benefits:

  • Reduced Server Load: By preventing crawlers from accessing URLs that initiate actions like adding items to carts or wishlists, you can significantly reduce unnecessary server requests and bandwidth usage.
  • Improved Crawler Efficiency: Offering clear instructions in your robots.txt file on which URLs crawlers should avoid can enhance the efficiency of crawling and indexing the content you want to be prioritized.
  • Better User Experience: With server resources concentrated on actual user actions rather than redundant crawler hits, users will likely experience faster load times and smoother functionality.
  • Compliance with Standards: Implementing this guidance ensures your site aligns with the widely accepted robots.txt protocol, which has been an industry best practice for decades.

Reassessing your robots.txt directives could be a straightforward yet impactful step for websites seeking better control over crawler activity.

Illyes’ insights suggest that the longstanding robots.txt rules remain highly relevant in today’s web environment.

Featured Image: BestForBest/Shutterstock

About BDM

We are a digital marketing firm dedicated to assisting our clients in achieving outstanding outcomes in various crucial sectors.

Request a free quote

We provide expert digital services designed to significantly improve websites' organic search rankings, enabling them to compete effectively for top positions, even with highly competitive keywords.

Subscribe to our newsletter!

More from our blog

See all posts