Keep Server Online
If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.
or
A donation makes a contribution towards the costs, the time and effort that's going in this site and building.
Thank You! Steffen
Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
| |
|
Topic: How To Modify HTTP Response In A Proxy Environment |
|
Author |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Fri 05 Feb '21 23:25 Post subject: How To Modify HTTP Response In A Proxy Environment |
|
|
I wanted to know what is the best way to replace the <HEAD> tag of a HTTP Response when you have your Apache setup as a load balancer Reverse Proxy Server.
I have few application servers that are running JBOSS and apache is setup to load balance the traffic that comes to these servers.
I need to inject a script tag:
<script> etc..</script>
inside the <HEAD> element of the HTML that gets sent back to the apache server.
I have used the "substitute" command of mod_substitute, but it seems not to work. It works if my Apache is not acting as a Reverse Proxy server and load balancer.
Was wondering whether I am missing any "ProxySet" commands for the proxy server.
Any ideas on what to do? |
|
Back to top |
|
mraddi
Joined: 27 Jun 2016 Posts: 152 Location: Schömberg, Baden-Württemberg, Germany
|
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Sat 06 Feb '21 21:58 Post subject: |
|
|
Thanks mraddi
I have tried many things, and I also tried your code. When apache compiles it says :
Unknown Filter Provided INFLATE
As long as the backend server is another server such as IIS or Apache then the ProxyPass and ProxyPassReverse works, but my issue is that the back end server is JBOSS and whatever I do it seems that the JBOSS server wipes it out and puts its own stuff there.
I might look at another method that doesn't involve Apache, because I am loosing my hair on this on the number of items I have tried  |
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Sat 06 Feb '21 22:23 Post subject: |
|
|
Its funny that when i put
loglevel substitute:trace8
I do see the substitution takes place in the error logs, but I cant see it when I do a view source. |
|
Back to top |
|
tangent Moderator
Joined: 16 Aug 2020 Posts: 370 Location: UK
|
Posted: Sun 07 Feb '21 0:41 Post subject: |
|
|
I saw your post earlier today, but haven't had time to prepare a more detailed reply.
As mraddi suggests, the problem relates to the proxied response from JBoss being compressed, which messes up mod_substitute being able to filter the response body. I've had exactly the same problem in the past.
His recommendation filter wise, is to inflate the proxied response, perform the substitution, and then compress the result before sending it to the client.
My solution, to save the inflate performance overhead (since my JBoss servers were on a local network), was to simply remove the Accept-Encoding request header, so that JBoss sent an uncompressed response which Apache could then filter. Put the following in your proxy location block or global configuration (if acceptable).
Code: | RequestHeader unset Accept-Encoding |
The greater challenge, should you accept it, is to then conditionally compress the response once mod_substitute has done its work. e.g.
1) use mod_rewrite to set an environment variable, recording the fact the client request includes an Accept-Encoding header
2) Remove the Accept-Encoding request header before the request is sent to the proxied back-end servers.
3) Use mod_substitute to filter the response body.
4) Conditionally run the deflate filter (with force-gzip set).
Note the deflate filter won't run if the Accept-Encoding request header is missing; hence the need to use the force-gzip variable (check the module source if you really want to see what really happens).
Also, rather than use AddOutputFilterByType directive, I used the later "filter chain" options to create a smart filter - see https://httpd.apache.org/docs/2.4/mod/mod_filter.html The FilterProvider directive is more flexible since you can use an ap_expr expression to dynamically control the execution, based on the content type as much as the presence of the original Accept-Encoding request header.
I'll see if I can dig out some previous configuration settings and post examples. |
|
Back to top |
|
tangent Moderator
Joined: 16 Aug 2020 Posts: 370 Location: UK
|
Posted: Sun 07 Feb '21 13:28 Post subject: |
|
|
The following are code snippets you should be able to weave into your configuration.
Edit the various FilterProvider entries to match your requirements, depending on which content types you want to filter, noting the FilterChain directive in the location block. Also, you might need to set the SubstituteMaxLineLength directive, depending on the size of JBoss content line you're trying to filter.
Note here I've replaced any original Accept-Encoding request header with the value 'identity', which means no compression or modification is required.
Code: | # Load additional modules.
#
LoadModule deflate_module modules/mod_deflate.so
LoadModule filter_module modules/mod_filter.so
LoadModule rewrite_module modules/mod_rewrite.so
# Configuration for substitute filter.
#
FilterDeclare Replace
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^application/(javascript|json|xml)#i"
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^text/(css|html|javascript|plain|xml)#i"
# Configuration for deflate (gzip) filter.
#
FilterDeclare Compress
FilterProvider Compress Deflate "%{CONTENT_TYPE} =~ m%^application/(javascript||xml)%i && reqenv('force-gzip') == 'true'"
FilterProvider Compress Deflate "%{CONTENT_TYPE} =~ m%^text/(css|html|javascript|php|plain|xml)%i && reqenv('force-gzip') == 'true'"
FilterProtocol Compress change=yes;byteranges=no
# Enable rewrite engine.
#
RewriteEngine On
# Check for Accept-Encoding request header, and if found set force-gzip variable.
#
RewriteCond %{HTTP:Accept-Encoding} '(deflate|gzip)' [NC,NV]
RewriteRule .* - [E=force-gzip:true,NE]
# Define secure site virtual host.
#
<VirtualHost *:443>
# Inherit mod_rewrite logic.
#
RewriteEngine On
RewriteOptions InheritBefore
# Enable SSL for this virtual host.
#
SSLEngine on
# Proxy selected requests to JBoss.
#
<LocationMatch ^/(.*)$>
# Insert filter chain for this site location.
#
FilterChain Replace Compress
# Replace original Accept-Encoding header with identity request.
#
RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity
ProxyPassMatch http(s)://my-jboss-server/$1
ProxyPassReverse http(s)://my-jboss-server/$1
# Use the substitute module, to edit the response body.
# Hence the need for the content to be uncompressed.
#
Substitute "s@before@after@inq"
</LocationMatch>
</VirtualHost> |
|
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Mon 08 Feb '21 10:09 Post subject: |
|
|
Tangent
Thank you very much for your help. That worked. The trick was the encoding as you mentioned. Thank you for the time you took to find the code.
I like to thank Mraddi as well for his help, he definitely had me trying many things and gave a whole perspective to the situation that I never thought before. |
|
Back to top |
|
gunderwood
Joined: 02 Mar 2022 Posts: 11 Location: US, Greenville, SC
|
Posted: Mon 07 Mar '22 16:11 Post subject: |
|
|
Hi,
I am trying to add the substitute filter you have in this configuration to my vhost.conf file. When I place the following code apache is indicating an error. The error is "Bad Substitute format, must be an s/// pattern.
RewriteEngine On
# Configuration for substitute filter.
#
AddOutputFilterByType SUBSTITUTE text/html
Substitute "%{CONTENT_TYPE} =~ m#^application/(javascript|json|xml)#i"
Thanks in advance! |
|
Back to top |
|
tangent Moderator
Joined: 16 Aug 2020 Posts: 370 Location: UK
|
Posted: Mon 07 Mar '22 19:21 Post subject: |
|
|
Your Substitute statement is indeed invalid.
I notice you've used AddOutputFilterByType rather than the FilterProvider structure in the example above. It's the latter that accepts the %{CONTENT_TYPE} match directive, to conditionally determine which content types the filter runs for. For a given content type, you can't make the filter run conditionally using AddOutputFilterByType.
Which ever variant you choose to run the filter, the Substitute statement should be somthing of the form:
Code: | Substitute "s@before@after@inq"
|
I tend to use an "@" symbol as the delimiter here, since "/" will most likely appear in content URLs in the response body, and it's often these URLs that need editing.
If you do crack this proxy content problem, please can you update the other post so we don't have two sets of discussion running. |
|
Back to top |
|
gunderwood
Joined: 02 Mar 2022 Posts: 11 Location: US, Greenville, SC
|
Posted: Tue 08 Mar '22 15:05 Post subject: |
|
|
Hi tangent,
I posted in this thread as I copied the code snippet from the above configuration file. I am sorry as I really am over my head on this topic. Looking at the code in this thread how would I implement the same substitute statement in my vhost.conf file? |
|
Back to top |
|
James Blond Moderator

Joined: 19 Jan 2006 Posts: 7399 Location: EU, Germany, Next to Hamburg
|
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Mon 24 Feb '25 10:08 Post subject: |
|
|
tangent wrote: | The following are code snippets you should be able to weave into your configuration.
Edit the various FilterProvider entries to match your requirements, depending on which content types you want to filter, noting the FilterChain directive in the location block. Also, you might need to set the SubstituteMaxLineLength directive, depending on the size of JBoss content line you're trying to filter.
Note here I've replaced any original Accept-Encoding request header with the value 'identity', which means no compression or modification is required.
Code: | # Load additional modules.
#
LoadModule deflate_module modules/mod_deflate.so
LoadModule filter_module modules/mod_filter.so
LoadModule rewrite_module modules/mod_rewrite.so
# Configuration for substitute filter.
#
FilterDeclare Replace
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^application/(javascript|json|xml)#i"
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^text/(css|html|javascript|plain|xml)#i"
# Configuration for deflate (gzip) filter.
#
FilterDeclare Compress
FilterProvider Compress Deflate "%{CONTENT_TYPE} =~ m%^application/(javascript||xml)%i && reqenv('force-gzip') == 'true'"
FilterProvider Compress Deflate "%{CONTENT_TYPE} =~ m%^text/(css|html|javascript|php|plain|xml)%i && reqenv('force-gzip') == 'true'"
FilterProtocol Compress change=yes;byteranges=no
# Enable rewrite engine.
#
RewriteEngine On
# Check for Accept-Encoding request header, and if found set force-gzip variable.
#
RewriteCond %{HTTP:Accept-Encoding} '(deflate|gzip)' [NC,NV]
RewriteRule .* - [E=force-gzip:true,NE]
# Define secure site virtual host.
#
<VirtualHost *:443>
# Inherit mod_rewrite logic.
#
RewriteEngine On
RewriteOptions InheritBefore
# Enable SSL for this virtual host.
#
SSLEngine on
# Proxy selected requests to JBoss.
#
<LocationMatch ^/(.*)$>
# Insert filter chain for this site location.
#
FilterChain Replace Compress
# Replace original Accept-Encoding header with identity request.
#
RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity
ProxyPassMatch http(s)://my-jboss-server/$1
ProxyPassReverse http(s)://my-jboss-server/$1
# Use the substitute module, to edit the response body.
# Hence the need for the content to be uncompressed.
#
Substitute "s@before@after@inq"
</LocationMatch>
</VirtualHost> |
|
Hi Tangent,
I am coming back to this original code that you pasted 3 years ago. What has happened is that we have noticed after 3 years this code causes a double substitution.
What I mean is this:
My code
-------
FilterDeclare Replace
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^application/(javascript|json|xml)#i"
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^text/(css|html|javascript|plain|xml)#i"
FilterDeclare Compress
FilterProvider Compress Deflate "%{CONTENT_TYPE} =~ m%^application/(javascript||xml)%i && reqenv('force-gzip') == 'true'"
FilterProvider Compress Deflate "%{CONTENT_TYPE} =~ m%^text/(css|html|javascript|php|plain|xml)%i && reqenv('force-gzip') == 'true'"
FilterProtocol Compress change=yes;byteranges=no
RewriteEngine On
RewriteOptions InheritBefore to accomodate AppDynamics monitoring
RewriteOptions InheritBefore
RewriteCond %{HTTP:Accept-Encoding} '(deflate|gzip)' [NC,NV]
RewriteRule .* - [E=force-gzip:true,NE]
<LocationMatch ^/(.*)$>
FilterChain Replace Compress
RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity
SubstituteMaxLineLength 5M
Substitute "s#<head>#<head><script> alert('This is a test')</script>#inq"
</location>
What happens is this:
When the page displays and I do a view source of the page by right clicking on the page I see:
<head><script> alert('This is a test')</script><script> alert('This is a test')</script>
that means I get the <script> tag twice.
Now if I comment this line
FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^text/(css|html|javascript|plain|xml)#i"
as such
#FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#^text/(css|html|javascript|plain|xml)#i"
then I dont get the duplicate happening and my final output looks normal like :
<head><script> alert('This is a test')</script>
Why is that ? |
|
Back to top |
|
tangent Moderator
Joined: 16 Aug 2020 Posts: 370 Location: UK
|
Posted: Tue 25 Feb '25 22:33 Post subject: |
|
|
I can't provide an obvious explanation.
The regex definitions in the FilterProvider statements assume the "text" or "application" types are at tied to the beginning of the CONTENT_TYPE header, so should only trigger on one or the other request to run the Substitute filter, not both. That is, unless there are two matching CONTENT_TYPE headers in the response (which isn't legal).
My original regex's were based on the content types present in a client application I was hosting at the time. Predictably, they might need revising to match your requirements.
As an experiment, you could try combining both match sets into one regex, e.g.
Code: | FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#(^text/(css|html|javascript|plain|xml))|(^application/(javascript|json|xml))#i"
|
If you still get two substitutions, that to me suggests something odd with the response headers. I'd check with browser developer tools to see exactly what CONTENT_TYPE header you get in the response body you want to edit, and change the regex to be more restrictive. |
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Sun 02 Mar '25 23:25 Post subject: |
|
|
tangent wrote: | I can't provide an obvious explanation.
The regex definitions in the FilterProvider statements assume the "text" or "application" types are at tied to the beginning of the CONTENT_TYPE header, so should only trigger on one or the other request to run the Substitute filter, not both. That is, unless there are two matching CONTENT_TYPE headers in the response (which isn't legal).
My original regex's were based on the content types present in a client application I was hosting at the time. Predictably, they might need revising to match your requirements.
As an experiment, you could try combining both match sets into one regex, e.g.
Code: | FilterProvider Replace Substitute "%{CONTENT_TYPE} =~ m#(^text/(css|html|javascript|plain|xml))|(^application/(javascript|json|xml))#i"
|
If you still get two substitutions, that to me suggests something odd with the response headers. I'd check with browser developer tools to see exactly what CONTENT_TYPE header you get in the response body you want to edit, and change the regex to be more restrictive. |
Hi Tangent,
After I made the changes with your suggestion, the same thing happened. This is a breakdown of all the files that get loaded
there are .html files. Content-Type is text/html;charset=ISO-8859-1
there are .css files. Content-Type is text/css or text/css;charset=ISO-8859-1
there are .js files. Content-Type is text/javascript
there are also other .js files that have a Content-Type that is application/javascript but on same Response Headers section there is another header called "x-Content-Type-Options" which has a value of "nosniff"
Also the same line in Chrome developer tools loads a Content-Type that is image/jpeg;charset=ISO-8859-1
there are JavaScriptServlet files. Content-Type is text/plain
These pretty much sums all the Content-Types that are in that page.
Do you think having
Content-Type
and
x-Content-Type-Options
on the same page Response Header causes the issue? |
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Mon 03 Mar '25 3:38 Post subject: |
|
|
What about the possibility of firewall manipulating the CONTENT_TYPE? Could that happen? or cause an issue?
Would adding a FilterTrace statement help in debugging this and see what is being substituted the second time? |
|
Back to top |
|
tangent Moderator
Joined: 16 Aug 2020 Posts: 370 Location: UK
|
Posted: Mon 03 Mar '25 19:26 Post subject: |
|
|
apishdad wrote: | Do you think having
Content-Type
and
x-Content-Type-Options
on the same page Response Header causes the issue? |
I'd not like to say without testing. They're different header names, but then the Apache expression parser simply identifies %{CONTENT_TYPE} as "The content type of the response (not available during <If>)".
But then looking at the content you're trying to edit; namely <head>, I'd expect that to be in a text/html response body, so you could restrict your filter provider accordingly.
Sometimes, it's best to try and go round a problem, rather than work through it.
Accepting the substitute filter appears to be running more than once, how about changing the regular expression to include a negative lookahead, so the required substitution should only match the first time, i.e.
Code: | Substitute "s@<head>(?!<script>)@<head><script> alert('This is a test')</script>@iq" |
Note do remove the 'n' option - we do want a regex check. |
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Wed 05 Mar '25 4:47 Post subject: |
|
|
I tried your suggestion but it did not work. Do you think If I run a Wireshark trace on that server it would help or its better to try to put a FilterTrace statement and debug it that way? |
|
Back to top |
|
tangent Moderator
Joined: 16 Aug 2020 Posts: 370 Location: UK
|
Posted: Wed 05 Mar '25 15:36 Post subject: |
|
|
Like you, I'm confused, particularly at why the revised regex (negative lookahead) didn't work for you. It worked for me in a test instance (admittedly with local rather than proxied content). When you say it doesn't work, do you mean you still get two substitutions, or none?
One other thing I find confusing is your above double substitution, where you say the result is
Code: | <head><script> alert('This is a test')</script><script> alert('This is a test')</script> |
whereas if the substitution is running twice I'd expect another '<head>' in the middle, viz:
Code: | <head><script> alert('This is a test')</script><head><script> alert('This is a test')</script> |
So yes, you could try FilterTrace (I've never used it), though I'd probably up the loglevel on the filter first, e.g.
Code: | LogLevel filter:trace6 |
Beyond that I'd also use Wireshark if you can to check on what precisely is being passed back in your proxied content, but am speculating your site is secure so you won't be able to use Wireshark (easily) on your frontend network traffic.
Yes, there could be some upstream filtering/cacheing taking place here. So what happens if you add no-cache response headers to this page? |
|
Back to top |
|
apishdad
Joined: 01 Jul 2019 Posts: 57 Location: Canada, Toronto
|
Posted: Wed 05 Mar '25 18:06 Post subject: |
|
|
tangent wrote: | Like you, I'm confused, particularly at why the revised regex (negative lookahead) didn't work for you. It worked for me in a test instance (admittedly with local rather than proxied content). When you say it doesn't work, do you mean you still get two substitutions, or none?
One other thing I find confusing is your above double substitution, where you say the result is
Code: | <head><script> alert('This is a test')</script><script> alert('This is a test')</script> |
whereas if the substitution is running twice I'd expect another '<head>' in the middle, viz:
Code: | <head><script> alert('This is a test')</script><head><script> alert('This is a test')</script> |
So yes, you could try FilterTrace (I've never used it), though I'd probably up the loglevel on the filter first, e.g.
Code: | LogLevel filter:trace6 |
Beyond that I'd also use Wireshark if you can to check on what precisely is being passed back in your proxied content, but am speculating your site is secure so you won't be able to use Wireshark (easily) on your frontend network traffic.
Yes, there could be some upstream filtering/cacheing taking place here. So what happens if you add no-cache response headers to this page? |
I just found what the issue was:
Somebody in a different file had put the following line in:
AddOutputFilterByType SUBSTITUTE text/html application/problem+json
I Commented the above line and everything works fine. My appologies for taking your time on this.
I appreciate and value your knowledge and time. Great knowledge and a pleasure to work with you.
Thank you very much for all your help and wisdom |
|
Back to top |
|
|
|
|
|
|