Because there are already sites that are running on Port 80, via Apache2, it is not possible to use squid or varnish, so instead I use Apache’s mod_proxy. This enables me to continue to run the virtual hosts that are controlled by the Apache on port 80, while also contributing to the caching pool.

Quick Start

This assumes your Apache 2 server is already working and capable of serving virtual hosts on port 80. Replace SERVER with the name of the web site you are mirroring, indymedia.org for example.

Apache configuration

Place this virtual host configuration the appropriate file for your apache2 server (on Debian,

/etc/apache2/sites-available/SERVER
):
<VirtualHost *:80>
    #LogLevel emerg
    #LogLevel debug
    ErrorLog /dev/null
    LogFormat "noip - - %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %V" noip
    CustomLog /var/log/apache2/access.log noip

    ServerName SERVER
    #ServerAlias www.SERVER

    <Proxy http://*SERVER:80/*>
        Order deny,allow
        Allow from all
    </Proxy>
    ProxyRequests Off
    CacheRoot /var/www/SERVER/cache
    CacheEnable disk /
    # DO NOT OMIT THE TRAILING / IN THE NEXT TWO LINES!!!
    ProxyPass / http://SERVER:80/
    ProxyPassReverse / SERVER:80/
    ProxyRequests Off
    #ProxyTimeout 60
    ProxyReceiveBufferSize 2048
    ProxyPreserveHost On
    ServerSignature Off
</VirtualHost>

Set up the Cache, proxy modules, and reload Apache

mkdir -p /var/www/SERVER/cache
# The following instructions are Debian specific, adapt as appropriate
chown www-data /var/www/SERVER/cache # make sure the web server can read/write here
a2ensite SERVER # "enable" this vhost (really just a symlink)
# make the disk-backed http caching proxy modules are available
a2enmod proxy
a2enmod disk_cache
a2enmod cache
a2enmod proxy_http
# reload the server configuration **BUT SEE TESTING BELOW FIRST**!
/etc/init.d/apache2 reload

Testing

BEFORE you reload your apache configuration, that is BEFORE your mirror should be working,
tell the machine from where you’ll be testing (where you’re running your web browser) that
the IP address of SERVER is actually the IP address of your apache box. On Linux this
means adding a line to

/etc/hosts
which looks something like this:
1.2.3.4      SERVER www.SERVER

and usually then RESTARTING YOUR WEB BROWSER. When this is set up, try
http://SERVER
in your web browser to confirm it is definitely your not-yet-configured server rather than the real SERVER.

When the above test confirms your web browser is talking to your apache server, reload the apache configuration and then reload

http://SERVER
and follow a few links there. If everything is perfect, you’ll see a mirrored copy of SERVER — check your apache log to be sure.

It never hurts to have someone on a different computer in a different location test your mirror too.

When your mirror seems to be working, someone will change DNS appropriately. It is not a bad idea to watch your logs, and your disk space, especially at first. Be sure your page service times look reasonable for example.

Troubleshooting

If some pages work and others do not, or if the CSS or images don’t always work, or if you see error/status 400 in your apache log, be sure you have those trailing / characters in your apache proxy config.

The apache log

/var/log/apache2/access.log
usually has important clues. You may need to change the
LogLevel
to
debug
until your mirror is working. Don’t leave it that way because it may log sensitive information about clients!

Other Topics

Sometimes the site being mirrored moves to a port other than 80 (to escape their load yet share their site with mirrors). In that case simply change the :80 in the apache config to the correct port and reload.

If the server you are mirroring can be reached through several different host names, use the

ServerAlias
apache directive. A typical use is included, commented out, in the example.

Some web sites use cookies in a way which defeats proxy caching, so your mirror doesn’t help them at all. There are proxy-specific workarounds, sometimes, but the best repair usually involves changing the master site configuration.

The Apache proxy/cache module(s) have many settings and you may be able to improve upon this example. Here are the relevant Apache configuration references: