Google's New Spider - Friend or Foe?

by Admin


20 Feb
 None    Search Engines


by Rob Sullivan


by Rob Sullivan
http://www.enquiro.com

Ever since we first heard of Big Daddy, Google's new data infrastructure I've been watching for anomalies across the Googlesphere.

And there was something interesting that began even before Big Daddy was announced.

There was a new Googlebot roaming the web and it has been acting like no other Googlebot.

I've seen reports of many weird things that this Googlebot can do that no other crawlers seem to be able to do.
First, let's start with an identification: This new Googlebot is known as:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Now the first difference you will notice is that it is using the Mozilla engine. This makes sense since Google has been busy hiring Firefox developers. Since the Firefox browser is also built on the Mozilla engine it makes sense to hire these developers to build a new crawler.

Originally the industry speculated that Google was going to get into the browser wars, but now that doesn't appear to be the case (at least to me).

So why would Google want to build a new crawler? More importantly why would they want to do it on the Mozilla engine?

Well consider that the old Googlebot runs on Lynx, a fairly old text based web browser.

Lynx is limited in what it can do. It can't handle JavaScript and it can't deal with CSS because of its text based nature. It is a nice, small, fast web browser but as a web user you give up too much using it.

And it is these shortcomings that have been issues for all crawlers, not just Google. You see, all the engines use similar frameworks for their crawlers, therefore they have similar limitations.

Why use Mozilla?

Well for one thing, it's open source which means anyone can use the codebase. And for another thing, the Mozilla base is much more advanced than Lynx. Mozilla CAN handle JavaScript, CSS and more.

In fact, Mozilla is a modern browser capable of rendering any web based code.

Right away you can see why Google would want to build this new crawler - the increased capabilities of it allow for more advanced crawling and indexing of the web.

So what anomalies am I seeing?

While I can't confirm too much of this at this point, my gut is telling me it's mostly true.

For one thing, this new spider is like a spider on steroids. It's hyperactive in its crawling. I've already had 2 clients that have had their websites go down because of its activity.

To put in context let me tell you how the new Googlebot compares to the old Googlebot on just one client's site:

A random sample of Googlebot activity on a site shows that the Mozilla bot requested almost 99,000 pages in that 3 day period. During the same period the old bot only requested 14,500 pages. The new bot requested almost 7 times as many pages.

Not only that, there have been reports of the new bot filling out and submitting forms! I've asked a few clients of mine that have forms if they can confirm this, but my gut tells me this is so. It's not a random act, or someone spoofing an IP or User Agent. This is truly an intelligent spider that can emulate human actions.

And it makes sense considering that Google wants it's user's experience to be the best on the web. Therefore they are going to want to ensure that those sites which show up at the top of the SERPs are also the most user friendly. Therefore the CSS must look nice, the JavaScript must work properly and forms must not contain buggy code.

In the end I think this new crawler is going to catch the web by surprise. This is because most of the sites we work on, we do so anticipating how the old Googlebot will react to the site, but if there is indeed a new Googlebot out there (And I do believe it is so) we are going to have to rethink how we code sites.

If we are hiding things in CSS or JavaScript, those tactics will no longer work. If we are hiding text in div layers or behind images, those too will no longer work.

Essentially what Google is doing is closing a few more loopholes. This will make it harder for some black hatters but as long as you follow Google's Webmaster Guidelines you will be alright.


Rob Sullivan
Head Organic Search Strategist
Enquiro Full Service Search Engine Marketing

Copyright 2003 - 2006 - Searchengineposition Inc.


News Categories

Ads

Ads

Subscribe

RSS Atom