One of the big reasons people disable ModSecurity is that they see in their logs continual blocks on Googlebot. Obviously, if Google’s bot cannot list your site you will be in trouble with trying to get people to your site. Resolution to the problem is just creating a rule that will allow the googlebot to access and read your site. The log entries have been modified to allow easier reading. The first part of the log entry “–ea572823-A–” shows you that the googlebot has arrived on your site and is trying to access a page, “GET /a_page_from_your_site HTTP/1.1″, of course this will be a specific page from your site. This section of the log entry “–ea572823-H–” shows that googlebot has been allowed by a specific rule created in modsecurity_crs_15_customrules.conf.
Note the version is the 2.0.5 ruleset.
[17/Apr/2010:09:18:05 --0500] ZXnZwUWvb-oAAEqiDqkAAAAR 22.214.171.124 58442 126.96.36.199 80
GET /a_page_from_your_site HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Message: Access allowed (phase 2). Pattern match “Googlebot” at REQUEST_HEADERS:User-Agent. [file "/etc/httpd/conf.d/modsecurity/modsecurity_crs_15_customrules.conf"] [line "25"]
Producer: ModSecurity for Apache/2.5.12 (http://www.modsecurity.org/); core ruleset/2.0.5.
Solution for allowing Googlebot
You will need to enter a custom rule in modsecurity_crs_15_customrules.conf so that googlebot can check your pages.
SecRule REQUEST_HEADERS:User-Agent “Googlebot” “allow”