CRAWL OPTIMIZATION

The crawl optimization chapter explains how to help search engines discover web documents in the most efficient manner for ecommerce websites. This is a key SEO focus for large websites like ecommerce websites. Important pages should be easy to reach, while less important pages should not waste “crawl budget” or create crawler traps. I explain concepts such as flat architecture, site maps and crawler guiding. Below is a snippet from this chapter:

“Search engines assign a crawl budget to each website, depending on its authority. The authority of a website is somehow proportional to the PageRank. The concept of crawl budget is important for ecommerce websites because they usually comprise a huge number of URLs—from tens of thousands to millions. If the technical architecture puts the search engine crawlers (aka robots, bots or spiders) in infinite loops or traps, the crawl budget will be wasted on pages that are not important for users or search engines, which may lead to important pages being left out of search engines indices.

Crawl optimization is where very large websites can take advantage of the opportunity to have more important pages indexed and low PageRank pages crawled more frequently.[i]

It’s true that the number of URLs Google can index at the time of the crawl increased dramatically after the introduction of Google’s powerful Percolator[ii] architecture, with the “Caffeine” update.[iii] However, it is still critical to monitor where search engine bots spend time on your website and to prioritize crawling accordingly.

Before we begin, it is important to understand that crawling and indexing are two different processes. Crawling means just fetching files from websites. Indexing means analyzing the files and deciding whether they are worthy of inclusion. So, even if search engines crawl a page, they won’t necessarily index it.

Several factors can influence crawling, such as the website’s structure, internal linking, domain authority, URL accessibility, content freshness, update frequency, and use of product and category feeds, and the crawl rate setting in the webmaster accounts.

Before detailing these, let’s talk about tracking and monitoring search engine bots.

[i] Google Patent On Anchor Text And Different Crawling Rates, http://www.seobythesea.com/2007/12/google-patent-on-anchor-text-and-different-crawling-rates/

[ii] Large-scale Incremental Processing Using Distributed Transactions and Notifications, http://research.google.com/pubs/pub36726.html

[iii] Our new search index: Caffeine, http://googleblog.blogspot.ca/2010/06/our-new-search-index-caffeine.html

  • Posted by  Traian
  • Comments are Off
About the Author

I am Traian, the author of the Ecommerce SEO book. I have more than 12 years of research and practical experience in SEO, my favorite area of digital marketing. When consulting and providing recommendations I like to take into account more than just the search engine: I consider information architecture, user interface and experience/usability issues.
Feel free to connect with me on LinkedIn, follow me on Google+ or on Twitter

Join The Ecommerce SEO Newsletter!

For updates and exclusive content, Sign up for my newsletter!