One of the big reasons people disable ModSecurity is that they see in their logs continual blocks on Googlebot. Obviously, if Google’s bot cannot list your site you will be in trouble with trying to get people to your site. Resolution to the problem is just creating a rule that will allow the googlebot to access and read your site. The log entries have been modified to allow easier reading. The first part of the log entry “–ea572823-A–” shows you that the googlebot has arrived on your site and is trying to access a page, “GET /a_page_from_your_site HTTP/1.1″, of course this will be a specific page from your site. This section of the log entry “–ea572823-H–” shows that googlebot has been allowed by a specific rule created in modsecurity_crs_15_customrules.conf.
Note the version is the 2.0.5 ruleset.
Other ModSecurity Information
How Good is Modsecurity
ModSecurity Getting Started
robots.txt
Apache Course
–ea572823-A–
[17/Apr/2010:09:18:05 --0500] ZXnZwUWvb-oAAEqiDqkAAAAR 66.249.71.101 58442 69.175.111.250 80
–ea572823-B–
GET /a_page_from_your_site HTTP/1.1
Host: example.com
Connection: Keep-alive
Accept: */*
From: googlebot(at)googlebot.com
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Accept-Encoding: gzip,deflate
–ea572823-H–
Message: Access allowed (phase 2). Pattern match “Googlebot” at REQUEST_HEADERS:User-Agent. [file "/etc/httpd/conf.d/modsecurity/modsecurity_crs_15_customrules.conf"] [line "25"]
Producer: ModSecurity for Apache/2.5.12 (http://www.modsecurity.org/); core ruleset/2.0.5.
–ea572823-Z–
Solution for allowing Googlebot
You will need to enter a custom rule in modsecurity_crs_15_customrules.conf so that googlebot can check your pages.
SecRule REQUEST_HEADERS:User-Agent “Googlebot” “allow”