logo
Apache Lounge
Webmasters

 

About Forum Index Downloads Search Register Log in RSS X


Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.

or

Bitcoin

A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
Post new topic   Forum Index -> Apache View previous topic :: View next topic
Reply to topic   Topic: HTACCESS Rewrite Ideas
Author
Wolf_22



Joined: 28 Jan 2009
Posts: 6

PostPosted: Fri 10 Apr '15 4:25    Post subject: HTACCESS Rewrite Ideas Reply with quote

I'm trying to prevent unnecessary GET requests from being processed by my CMS that originate from mutating IP address locations. This is sucking up server resources when the request is processed by the app, and so if possible, I'd like to block them with HTACCESS so that the request is stopped before anything is intensively-processed.

What happens is that an IP address will make a GET request for, say, "blah/test" or "blah/test2" but nothing else (no site assets like images or CSS/JavaScript files or even other pages). After this request, another IP address will then make an equivalent kind of request, and so on, and so on... All of them have similar if not identical user agent strings but they're always worthless requests that do nothing but waste CPU and RAM. I'm assuming it's just some idiotic SPAM bot because of this.

The following is a sample of the kind of requests I'm seeing:
Quote:
79.133.XXX.XXX - - [21/Mar/2015:11:40:13-0600] "GET /?x=blah/test2 HTTP/1.1" 200 2001 "http://www.<my domain name>.com/?x=blah/test" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60"


One approach I've taken in trying to eliminate this kind of request is by using HTACCESS to create a cookie that I then use to perform a RewriteRule with. For example, if the cookie value isn't what I expect, I then block them or redirect; whatever. I think this helps, but due to the level of complexity involved with the sometimes differing user agent strings as well as the time and thought it takes to test everything, well, it creates some headaches.

Has anyone here dealt with this kind of thing (I know that's a loaded question because I already know you have)? So if so, how did you treat / stop it?

My plan of attack right now is to use the cookie approach (only refactored or refined to be more inclusive of the form these bots use) and use a RewriteRule to redirect the lame requests to a static, non-resource-intensive 400 kind of page. (And yes, I know the static page will still use resources, but it's better than firing off the multiple levels of processing my CMS uses, right?)

Thoughts?
Back to top
glsmith
Moderator


Joined: 16 Oct 2007
Posts: 2268
Location: Sun Diego, USA

PostPosted: Fri 10 Apr '15 5:12    Post subject: Reply with quote

Any thoughts by someone would probably depend on what version of Apache you are using. It does make a difference.

This is an Apache question, not really a coding a scripting question. I'll be moving it there after you respond. So be prepared to find it there.
Back to top
Wolf_22



Joined: 28 Jan 2009
Posts: 6

PostPosted: Fri 10 Apr '15 5:28    Post subject: Reply with quote

Ah, my bad. Move it away then. Smile

My version of Apache is 2.2.22 (Workstation) and 2.2.29 (Production).
Back to top
glsmith
Moderator


Joined: 16 Oct 2007
Posts: 2268
Location: Sun Diego, USA

PostPosted: Fri 10 Apr '15 7:16    Post subject: Reply with quote

Bummer, with 2.4 you could redirect them back to themselves using <If>. However, this works for me in 2.2.29.

I created a folder called blackhole (cause you got to rewrite to somewhere) and just put a blank index.html file in there with
Code:
<html><head><title>WHATAPUTZ!</title></head><body bgcolor="000000"></body></html>


The ?$1 is very important, without it I'd get a 503 Service Temporarily Unavailable cause it keeps redirecting itself.

Code:
RewriteEngine On
RewriteCond %{QUERY_STRING} blah [NC]
RewriteRule .* /blackhole/?$1    [L,R=301]


Thinking about it now, may not even need the folder. Gonna play some more.

Ja, this worked too
Code:
RewriteEngine On
RewriteCond %{QUERY_STRING} blah [NC]
RewriteRule .* /whataputz!.html?$1 [L,R=301]


Quote:
Has anyone here dealt with this kind of thing (I know that's a loaded question because I already know you have)? So if so, how did you treat / stop it?


Not I luckily but plenty of other types of pains in the butt.
Back to top
glsmith
Moderator


Joined: 16 Oct 2007
Posts: 2268
Location: Sun Diego, USA

PostPosted: Fri 10 Apr '15 7:31    Post subject: Reply with quote

The above is assuming ?x=blah is not valid in your cms.
Back to top
Wolf_22



Joined: 28 Jan 2009
Posts: 6

PostPosted: Fri 10 Apr '15 13:03    Post subject: Reply with quote

What about this... The signatures of these requests almost always use the user agent string I mentioned in that quoted log entry. Taking that into account, validating the cookie, and combining all this with whether or not the request entails any use of images, CSS, or JavaScript; it all paints a picture of the qualifying conditions necessary to determine whether the IP should be prevented access.

Focusing only on the cookie part, maybe you could explain to me why the following doesn't work?:

Code:
RewriteCond %{HTTP_COOKIE} !arealua [NC]
RewriteCond %{HTTP_COOKIE} !arealua=true$ [NC]
RewriteRule ^(.*)$ /test [R=301,L]


What I'm trying to do with the above is check if the requester has the cookie called "arealua" and whether or not it's set to true. (Think of a PHP isset() along with AND === true.) The thing is that I've been trying to test this with Firefox and Firebug by resetting values, etc. and it's hit-and-miss.

Any ideas why?
Back to top
glsmith
Moderator


Joined: 16 Oct 2007
Posts: 2268
Location: Sun Diego, USA

PostPosted: Fri 10 Apr '15 20:26    Post subject: Reply with quote

Wolf_22 wrote:
Any ideas why?


Maybe, in most cases the cookie is not available right away. The rewrite happens before any cookie is set I believe anyway and therefor at least one request is going to make it through. You are trying with Firefox which accepts cookies, the bot itself most likely doesn't care and may never return it on any subsequent request. Since it seems to be a bot to you, it probably only makes a single request.

I would think since the cookie may not yet be set on any legitimate request, that this would punt them out as well with the !arealua, and since arealua is not available, !arealua=true is TRUE also.

Then there are people like me that almost always refuse to accept cookie (I use CookieSafe in Firefox). I allow them temporarily if I absolutely have to and curse the site the entire time I'm surfing it, or simply move on.

And last but not least, the EU has a thing about asking users to accept cookies before setting them. Most sites never pay attention to this but if you are selling any products/services, they may try to enforce it ... if you're really unlucky.

At least with mine, if x should never ever = /blah/i on any legitimate request, what's the problem? It's using the KISS principal in my opinion which may or may not amount to much. Laughing
Back to top
Wolf_22



Joined: 28 Jan 2009
Posts: 6

PostPosted: Mon 13 Apr '15 1:31    Post subject: Reply with quote

Quote:
At least with mine, if x should never ever = /blah/i on any legitimate request, what's the problem? It's using the KISS principal in my opinion which may or may not amount to much. Laughing


That would work if it was SS (minus the KI. Laughing).

The issue is that it's not just the request URI that defines the legitimacy of the request. I jumped in thinking that I could rely on the cookie existence as a part of the whole in determining the legitimacy of the overall request but if what you say is true, I'll have to rely instead on not only what URI they request but also a combination of their browser agent string(s) as well as the assets they request (assuming that all legitimate users' requests will show an access of the various image and CSS files, too). <-- Is that true?

So, here's what I'm thinking of using to determine legitimacy (RewriteCond lines):

1.) Does the browser agent string contain "Mozilla" and "AppleWebkit"?
2.) Have they requested any assets (i.e. - image and CSS files)?
3.) Is the endpoint they're trying to access "blah/test"?
4.) Doe they have a referrer set?

In my specific situation(s), I would think this would eliminate at least 80% of the spam bots. For my situation, "blah/test" is actually a login... So if during that request, they're not allowing images or CSS to also be accessed and they have no referrer, that would tell me that it's likely a spam bot (most of the useless browser agent strings for these requests contained "Mozilla" and "AppleWebkit", which is why I included that).

Thoughts?
Back to top
glsmith
Moderator


Joined: 16 Oct 2007
Posts: 2268
Location: Sun Diego, USA

PostPosted: Mon 13 Apr '15 4:33    Post subject: Reply with quote

Different story if x=blah/test can be a legitimate request. In that case you might be screwed. Modify your CMS to make blah something else. Whatever.

Regardless of the route you want to take, think it through. "If I do this can it also cause a legitimate request to fail?" If the answer is yes, you probably do not want do that.

1.) Does the browser agent string contain "Mozilla" and "AppleWebkit"?
I am on Chrome, every WebKit based browser like Chrome will match both because they are also "Mozilla/x.x Compatible." I am not a bot.

The illegitimate has these also per example posted above. It may be a bot, but I am not a bot yet fit the profile.

2.) Have they requested any assets (i.e. - image and CSS files)?
Look at your access log, every get/post/put/head/etc. is a separate request. When I as a visitor request /?x=blah/test, I'm not requesting a css or png or anything except /?x=blah/test. I am Mozilla & AppleWebkit, but I am not a bot.

3.) Is the endpoint they're trying to access "blah/test"?
Yeah, and apparently I as a visitor can make a legitimate request to that and expect receive something back for it. I am not requesting a css/png/etc, I am Mozilla & AppleWebkit, but still I am not a bot.

4.) Do they have a referrer set?
I as a visitor came from a bookmark so I have no Referrer. I have legitimately requested /?x=blah/test, I'm not requesting a css/png/etc, I am Mozilla & AppleWebkit, yet for some reason I am still not a bot.
Back to top


Reply to topic   Topic: HTACCESS Rewrite Ideas View previous topic :: View next topic
Post new topic   Forum Index -> Apache