Posted on 04/08/2011 12:15:11 PM PDT by FreeAtlanta
My host has been trottling the CPU of my sites like crazy this last week. I started investigating the logs and this ip (66.249.72.198) with this query http://dougsearch.dougsmugs.com/web/?query=obama+executive+order+to+seal+records came up over and over and over again on my search site. I got a little suspicious and blocked the ip. I then decided to find out who owns the ip (http://whois.domaintools.com/66.249.72.198. It is google!
I hate blocking their crawler for all my sites, but they are beating me up. I normally would think it is some crawler glitch, but the search (obama executive order to seal records) is a little suspicious, knowing their hard left leanings and bias against conservatives and anything negative of Obama.
Alas, I will leave the block in and see if the cpu throttling gets better. Google has the horse power to bring down any site they want to.
I might have to turn off dougsearch.com if the throttling doesn't stop. It doesn't make me any money and Yahoo is now releasing BOSS V2 as a paid plan. The throttling is hurting my business site DougsMugs.com where I sell my own fresh roast coffee.
I enjoy the search engine coding, but I currently don't have time to pursue it, and paying for the results doesn't make much sense to me right now.
Well, we all know that Google is as in with Obama as GE and GM.
There are tags that you can put in your pages so Google doesn’t index or cache them.
Their crawler changes ip addresses, so adding the tags is better than trying to block specific ips.
Check the user-agent:
“Google has several other user-agents, including Feedfetcher (user-agent Feedfetcher-Google). Since Feedfetcher requests come from explicit action by human users who have added the feeds to their Google home page or to Google Reader, and not from automated crawlers, Feedfetcher does not follow robots.txt guidelines. You can prevent Feedfetcher from crawling your site by configuring your sever to serve a 404, 410, or other error status message to user-agent Feedfetcher-Google. More information about Feedfetcher.”
http://www.google.com/support/webmasters/bin/answer.py?answer=182072
(sorry, too lazy to format html - you can copy/paste)
Let’s see.
The website has a Google PR of 3, Alexa Ranking of 8.5 million, and no ranking by either Compete or Quantcast.
Highly doubt you’re on Google’s hit list at this point (if such a list exists).
This is treading into tinfoil hat territory.
I doubt they would deliberately attack an unknown search. I think their crawler probably found the link somewhere and is just pounding the heck out of it due to some mess up in their query algorithm.
Whatever it was, the CPU throttling seems to be settling down for now. I am going to keep an eye on it.
This, coupled with upgrading to Prestashop 1.4 and moving my retail domain from www.dougsmugscoffee.com to www.dougsmugs.com and a bunch of issues flipping the ssl certificate, it has been a trying couple of weeks.
lol, yeah, I don’t think they know I exist... although, they did start using my idea of allowing users to exclude domains from search results. :-) It was likely serendipity. I am just perturbed they smash my site.
Thank you,
That is probably the answer. I will implement. I really don’t want to fall out of all of google results.
I probably can also knock them out of dougsearch.com with the robots file and let them continue to scan dougsmugs.com
I take it that obama executive order to seal documents is a trending term in google? ... lol
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.