using apache as a filtering web proxy¶
There are many kinds of web proxies and many web proxies available. This howto covers the case where you want to be able to perform filtering on the content of all web traffic. There are several fancy packages for doing this on your home computer (to filter out ads, popups, img bugs, etc) like middleman and filterproxy.
These packages are cool, but they are complicated, a little unstable (in my limited experience), and are overkill for what I need. Instead, in this howto, we will use a very simple apache2 setup to pass all content through a sed script.
Currently, i think the packages in debian to make this work only exist for apache2, although it should work in apache1 as well.
# apt-get install apache2
set the port the proxy will be on in /etc/apache2/ports.conf:
bc. Listen 8080
disable unneeded modules and enable the ones we want:
cd /etc/apache2 rm sites-enabled/* rm mods-enabled/* cd mods-enabled ln -s ../mods-available/proxy.* . ln -s ../mods-available/ext_filter.load .
enable the proxy and set up the filter by editing /etc/apache2/mods-enabled/proxy.conf:
ExtFilterDefine my-filter mode=output intype=text/html cmd="/path/to/script" <IfModule mod_proxy.c> ProxyRequests On <Proxy *> Order deny,allow Deny from all Allow from 127.0.0.1 SetOutputFilter my-filter </Proxy> ProxyVia Off </IfModule>
What is this doing?
- ExtFilterDefine: this defines the filter we will use.
- ProxyRequests On: turns the proxy on.
- Order, Deny, Allow: this allows only localhost to connect to the proxy. It is bad news to have a proxy which is public, so you should make sure to limit access to only those who should have access (unless the purpose is to create an anonymizing proxy…).
- SetOutputFilter: enables the filter for proxied content.
- ProxyVia Off: tells proxy mod to not change Via headers.
- Disables caching: the defauilt config has a caching section. Remove it to disable caching.
Create the script specified in ExtFilterDefine:
#!/bin/sed -f s/capitalism/free society/g
Make sure that the script is owned by www-data and is executable.
You could use any language for the script. Here we use sed, but perl would work too. This filter will replace “capitalism” with “free society”.
See httpd.apache.org/docs-2.0/mod/mod_ext_f... for more information of external apache filters.