logo
Apache Lounge
Webmasters

 

About Forum Index Downloads Search Register Log in RSS X


Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.

or

Bitcoin

A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
Post new topic   Forum Index -> Apache View previous topic :: View next topic
Reply to topic   Topic: Blocking Bad Bots
Author
Mister Nice



Joined: 07 Nov 2016
Posts: 9
Location: USA

PostPosted: Mon 07 Nov '16 22:21    Post subject: Blocking Bad Bots Reply with quote

I've used various versions of the code below to try and block bad bots, over several months, but have come to the realization that it never actually works.

My server has a number of virtual hosts, and so I'd like to have the code in httpd.conf, rather than separate .htaccess files, as it makes it that much easier to maintain.

Server Info:
Apache Version: Apache/2.2.15 (Unix)
OS: CentOS release 6.2

So, the code below is an abbreviated extract from my httpd.conf file, with just one virtual host section listed, and just a portion of the bots listed:

Code:
<Location *>
SetEnvIfNoCase User-Agent ".*MJ12bot.*" bad_bot
SetEnvIfNoCase User-Agent ".*Baiduspider.*" bad_bot
SetEnvIfNoCase User-Agent ".*Vagabondo.*" bad_bot
SetEnvIfNoCase User-Agent ".*lwp-trivial.*" bad_bot
SetEnvIfNoCase User-Agent ".*libwww.*" bad_bot
SetEnvIfNoCase User-Agent ".*Wget.*" bad_bot
SetEnvIfNoCase User-Agent ".*XoviBot.*" bad_bot
SetEnvIfNoCase User-Agent ".*xovibot.*" bad_bot
SetEnvIfNoCase User-Agent ".*AhrefsBot.*" bad_bot
SetEnvIfNoCase User-Agent "SemrushBot" bad_bot
Deny from env=bad_bot
</Location>

<VirtualHost xx.xxx.xx.xxx:80>
DocumentRoot "/var/www/sites/xxx"
ServerName www.xxx.com
ServerAlias xxx.com

ScriptAlias /cgi-bin/   "/var/www/sites/xxx/cgi-bin/"
AddType application/x-httpd-php .html .php

<Directory "/var/www/sites/xxx">
Order allow,deny
Allow from all
Deny from env=bad_bot
Options FollowSymLinks +ExecCGI +Includes
RewriteEngine On
AllowOverride All
Include "/var/www/sites/xxx/.htaccess"
</Directory>

CustomLog "/var/www/sites/logs/xxx_access.log" combined
ErrorLog  "/var/www/sites/logs/xxx_error.log"
</VirtualHost>


I've tried various things in regards to how I write the bots section, such as wildcarding it, or just the bot name in quotes, or prefixing it with a ^ symbol, which would hopefully catch the bot name if the User-Agent actually begins with the bot name, etc, etc.

However, nothing I do seems to make the slightest difference, and everything still gets served up for these bots with a 200, for local content, or 302 if it's following a link to off-site content. I'm figuring it should be throwing off error 403's.

Any assistance appreciated.

Many thanks.
Back to top
admin
Site Admin


Joined: 15 Oct 2005
Posts: 692

PostPosted: Mon 07 Nov '16 22:27    Post subject: Reply with quote

See https://www.apachelounge.com/viewtopic.php?t=5438
Back to top
Mister Nice



Joined: 07 Nov 2016
Posts: 9
Location: USA

PostPosted: Mon 07 Nov '16 22:46    Post subject: Reply with quote

Thank you for the link, which is interesting, but each method is different to the one I'm trying to implement, and I guess I'd like to know why the code I've tried doesn't work.

However, I'm not averse to trying to implement one of the suggestions on the page you cited, but I'm not too sure about the "<RequireAll>" method, as I don't really understand how/what it's doing; it also seems to use <Directory> tags, which are presumably specific to each virtual host, while I'd like to have a global version for all virtual hosts (hence why I used the generic <Location *> tag in my own attempts).

And the second method, using RewriteCond, I'm not really sure how I would put this into the httpd.conf file, as I've only ever played with RewriteEngine in .htaccess files, though presumably there's some relatively simple way of doing so.

Thanks again. Hoping someone can assist.
Back to top


Reply to topic   Topic: Blocking Bad Bots View previous topic :: View next topic
Post new topic   Forum Index -> Apache