Keep Server Online
If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.
or
A donation makes a contribution towards the costs, the time and effort that's going in this site and building.
Thank You! Steffen
Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
| |
|
Topic: Blocking Bad Bots |
|
Author |
|
Mister Nice
Joined: 07 Nov 2016 Posts: 9 Location: USA
|
Posted: Mon 07 Nov '16 22:21 Post subject: Blocking Bad Bots |
|
|
I've used various versions of the code below to try and block bad bots, over several months, but have come to the realization that it never actually works.
My server has a number of virtual hosts, and so I'd like to have the code in httpd.conf, rather than separate .htaccess files, as it makes it that much easier to maintain.
Server Info:
Apache Version: Apache/2.2.15 (Unix)
OS: CentOS release 6.2
So, the code below is an abbreviated extract from my httpd.conf file, with just one virtual host section listed, and just a portion of the bots listed:
Code: | <Location *>
SetEnvIfNoCase User-Agent ".*MJ12bot.*" bad_bot
SetEnvIfNoCase User-Agent ".*Baiduspider.*" bad_bot
SetEnvIfNoCase User-Agent ".*Vagabondo.*" bad_bot
SetEnvIfNoCase User-Agent ".*lwp-trivial.*" bad_bot
SetEnvIfNoCase User-Agent ".*libwww.*" bad_bot
SetEnvIfNoCase User-Agent ".*Wget.*" bad_bot
SetEnvIfNoCase User-Agent ".*XoviBot.*" bad_bot
SetEnvIfNoCase User-Agent ".*xovibot.*" bad_bot
SetEnvIfNoCase User-Agent ".*AhrefsBot.*" bad_bot
SetEnvIfNoCase User-Agent "SemrushBot" bad_bot
Deny from env=bad_bot
</Location>
<VirtualHost xx.xxx.xx.xxx:80>
DocumentRoot "/var/www/sites/xxx"
ServerName www.xxx.com
ServerAlias xxx.com
ScriptAlias /cgi-bin/ "/var/www/sites/xxx/cgi-bin/"
AddType application/x-httpd-php .html .php
<Directory "/var/www/sites/xxx">
Order allow,deny
Allow from all
Deny from env=bad_bot
Options FollowSymLinks +ExecCGI +Includes
RewriteEngine On
AllowOverride All
Include "/var/www/sites/xxx/.htaccess"
</Directory>
CustomLog "/var/www/sites/logs/xxx_access.log" combined
ErrorLog "/var/www/sites/logs/xxx_error.log"
</VirtualHost> |
I've tried various things in regards to how I write the bots section, such as wildcarding it, or just the bot name in quotes, or prefixing it with a ^ symbol, which would hopefully catch the bot name if the User-Agent actually begins with the bot name, etc, etc.
However, nothing I do seems to make the slightest difference, and everything still gets served up for these bots with a 200, for local content, or 302 if it's following a link to off-site content. I'm figuring it should be throwing off error 403's.
Any assistance appreciated.
Many thanks. |
|
Back to top |
|
admin Site Admin
Joined: 15 Oct 2005 Posts: 692
|
|
Back to top |
|
Mister Nice
Joined: 07 Nov 2016 Posts: 9 Location: USA
|
Posted: Mon 07 Nov '16 22:46 Post subject: |
|
|
Thank you for the link, which is interesting, but each method is different to the one I'm trying to implement, and I guess I'd like to know why the code I've tried doesn't work.
However, I'm not averse to trying to implement one of the suggestions on the page you cited, but I'm not too sure about the "<RequireAll>" method, as I don't really understand how/what it's doing; it also seems to use <Directory> tags, which are presumably specific to each virtual host, while I'd like to have a global version for all virtual hosts (hence why I used the generic <Location *> tag in my own attempts).
And the second method, using RewriteCond, I'm not really sure how I would put this into the httpd.conf file, as I've only ever played with RewriteEngine in .htaccess files, though presumably there's some relatively simple way of doing so.
Thanks again. Hoping someone can assist. |
|
Back to top |
|
|
|
|
|
|