charybdis 4.1 upgrade

I am in the process of performing the charybdis 4.1 upgrade and here are my notes.

Upgrade procedure

The Debian package was updated to 4.1 by Unit 193, a recent contributor to the IRCD Debian packages — he also helped with the recent atheme-services upload. The charybdis-4.1 package was rebuilt as a backport for stretch and uploaded to the least busy server at the time (chat0).

CERTFP problems

The server rebooted fine but failed to relink, with this error message:

/var/log/charybdis/ircd.log:2018/6/19 12.18 "sharedconf/connects.conf", line 10: Ignoring connect block for che.indymedia.org -- no fingerprint provided for SSL connection.

The fix was to add the fingerprint of the remote server in the connect {} block. The fingerprint can be extracted from the certificate file with:

certtool -i < /etc/letsencrypt/live/irc.indymedia.org/chain.pem | grep -A 1 Fingerprint -m 1

Unfortunately, this means the fingerprint is based on the certificate (as opposed to the private key) which changes every time the certificate is renewed. That is a pain in the bottom, because that changes regularly, and even more because we’re using Let’s encrypt.

This mechanism is defined by the certfp_method and it can be changed to spki_sha256 to use the fingerprint of the private key instead. The problem is that this also changes the CERTFP authentication from clients to nickserv, so we need our users to add those fingerprints.

To find that fingerprint, you use:

( openssl x509 -pubkey -noout | openssl pkey -pubin -outform DER | sha256sum ) < /etc/letsencrypt/live/irc.indymedia.org/chain.pem

This is also how it works for clients although we’ve established a simpler, certtool-based procedure for them, in the announcement. There are 23 users affected:

root@chat1:~# grep MCFP /var/lib/atheme/services.db | awk '{ print $2 }' | sort -u  | wc -l 
23

… This is less than 10% of the userbase:

root@chat1:~# grep ^MU /var/lib/atheme/services.db | wc -l 
338

The nice thing with the CERTFP change is that users do not need to generate a self-signed cert anymore, which greatly simplifies the configuration. The announcement will be sent to users invidually, using the following message:

Hi! In about two weeks, a change will be made to this IRC server's configuration which requires you to add a new fingerprint to your NickServ configuration. More information and proof available here: https://we.riseup.net/ircd/certificate-authentication-requires+467154

CERTFP rotation

The above procedure involved upgrading certbot on the main renewal host (chat1.koumbit.net):

certbot certonly --webroot --webroot-path /var/www/html --reuse-key -d irc.indymedia.org

The certs were then redistributed around the cluster and charybdis reloaded. Then the certfp_method was changed to be based on the private key:

commit d90fec53ecba740f7b2fb91ead4631a962943083
Author: root <root@chat0.koumbit.net>
Date:   Mon Jul 2 12:01:51 2018 -0400

    switch to SPKI

diff --git a/common.conf b/common.conf
index 44b0a75..a5933ce 100644
--- a/common.conf
+++ b/common.conf
bc. -142,8 +142,8 @@ general {
        reject_ban_time = 1 minute;
        reject_after_count = 3;
        reject_duration = 5 minutes;
-       certfp_method = sha1;
-       #certfp_method = spki_sha256;
+       #certfp_method = sha1;
+       certfp_method = spki_sha256;
 };
 
 modules {
diff --git a/connects.conf b/connects.conf
index 79b3e71..f98463a 100644
--- a/connects.conf
+++ b/connects.conf
bc. -6,6 +6,7 @@ connect "che.indymedia.org" {
        hub_mask = "*";
        class = "server";
        flags = ssl, topicburst, autoconn;
+       fingerprint = "SPKI:SHA2-256:8ad5a5348902566d3c5bf859f006b901a620b6a833f5ffaac992d6418ad8c0aa";
 };
 
 connect "chat0.koumbit.net" {
bc. -16,6 +17,7 @@ connect "chat0.koumbit.net" {
        hub_mask = "*";
        class = "server";
        flags = ssl, topicburst, autoconn;
+       fingerprint = "SPKI:SHA2-256:8ad5a5348902566d3c5bf859f006b901a620b6a833f5ffaac992d6418ad8c0aa";
 };
 
 connect "chat1.koumbit.net" {
bc. -31,9 +33,9 @@ connect "chat1.koumbit.net" {
         # Fingerprint:
         #        sha1:4eede5e01df9b58526855c0d6fa22d4306b15d3e
         #        sha256:d706b0aeaba48e8e8a5094be86b34062ed3bafb5aea439f98eff141682b51686
-       fingerprint = "4eede5e01df9b58526855c0d6fa22d4306b15d3e";
+       #fingerprint = "4eede5e01df9b58526855c0d6fa22d4306b15d3e";
        # certtool --pubkey-info --load-privkey /etc/letsencrypt/live/irc.indymedia.org/privkey.pem  | grep sha256 | sed 's/.*sha256:/SPKI:SHA2-256:/' 
-       #fingerprint = "SPKI:SHA2-256:8ad5a5348902566d3c5bf859f006b901a620b6a833f5ffaac992d6418ad8c0aa";
+       fingerprint = "SPKI:SHA2-256:8ad5a5348902566d3c5bf859f006b901a620b6a833f5ffaac992d6418ad8c0aa";
 };
 
 connect "services.indymedia.org" {

The above config was distributed everywhere and yay, it works. It will need to change again when we fix the renewal process: it would be preferable to have different private keys for each host, but for now this is good enough.

AUTHD problems

The new server otherwise seems to work correctly, although a weird bug was introduced during the upgrade (or before?) where connections would hang after this line:

16:27:56 -!- Irssi: Looking up irc.indymedia.org
16:27:56 -!- Irssi: Connecting to irc.indymedia.org [216.46.7.99] port 6697
16:27:56 -!- Irssi: Certificate Chain:
16:27:56 -!- Irssi:   Subject: CN: irc.indymedia.org
16:27:56 -!- Irssi:   Issuer:  C: US, O: Let's Encrypt, CN: Let's Encrypt Authority X3
16:27:56 -!- Irssi:   Subject: C: US, O: Let's Encrypt, CN: Let's Encrypt Authority X3
16:27:56 -!- Irssi:   Issuer:  O: Digital Signature Trust Co., CN: DST Root CA X3
16:27:56 -!- Irssi: Protocol: TLSv1.2 (256 bit, ECDHE-RSA-AES256-GCM-SHA384)
16:27:56 -!- Irssi: EDH Key: 521 bit ECDH: secp521r1
16:27:56 -!- Irssi: Public Key: 2048 bit RSA, valid from Apr  4 15:38:01 2018 GMT to Jul  3 15:38:01 2018 GMT
16:27:56 -!- Irssi: Public Key Fingerprint:  
          D1:28:F9:B0:4B:92:E0:88:A8:81:1E:84:CA:8E:0C:B0:FC:1E:65:CA:1A:DF:0E:B1:10:2A:AF:D2:83:81:EF:13 (SHA256)
16:27:56 -!- Irssi: Certificate Fingerprint: 
          2F:F8:29:36:5B:70:84:05:35:17:0E:9D:D5:3D:65:21:B1:AC:4B:EB:D8:88:DE:FC:0A:CA:71:12:68:0A:BA:16 (SHA256)
16:27:56 -!- Irssi: Connection to irc.indymedia.org established
16:27:56 !chat0.koumbit.net *** Ident disabled, not checking ident
16:27:56 !chat0.koumbit.net *** Looking up your hostname...
16:27:56 !chat0.koumbit.net *** Couldn't look up your hostname

and eventually abort after a ~60 seconds timeout:

16:28:56 -!- ERROR Closing Link: 127.0.0.1 (Connection timed out)

The “fix” was the following diff:

commit 184b02d99f3e2272dde3b10a4d8f0636a25baae3
Author: root <root@chat0.koumbit.net>
Date:   Tue Jun 19 17:53:27 2018 -0400

    do not disable_auth, it breaks connexions somehow

diff --git a/common.conf b/common.conf
index 7ec4fdb..21294b1 100644
--- a/common.conf
+++ b/common.conf
bc. -129,7 +129,7 @@ general {
        short_motd = no;
        ping_cookie = yes;
        connect_timeout = 30 seconds;
-       disable_auth = yes;
+       disable_auth = no;
        no_oper_flood = yes;
        max_targets = 4;
        client_flood = 20;

It is unclear why that stopped working considering it worked before the 4.1 and that downgrading to 3.5 did not fix the issue.

certbot changes

To keep our sanity, we’ll want to avoid regenerating the private keys of each server on renewal, which requires a change in the certbot client. This has been implemented as the --reuse-key flag in PR #5901 which was shipped in certbot 0.25. That version was accepted in Debian unstable on 2018-06-12 and migrated to testing on 2018-06-14. It is therefore a candidate to a stretch backport and I have asked one of the certbot maintainers (hlieberman) if when the backport would be updated.

Until then, we can either work on a backport ourselves or wait.

Next steps

4.1 was tested on chat0 and might still be running at the time of writing. An upgrade should be possible without breakage, but will break when certificates are rotated again. The above certtool command can be used to extract the new fingerprint and reconfigure the ircd to match it, but we need to rotate to SPKI so that we don’t have to change the config all the time. We also need to make sure certbot does not regenerate the private part of the key when we renew the certificate, something that is supported but it is unclear how to do this with certbot.

Checklist

1. fix the 4.1 upgrade and chat0 (done! was a problem with authd)
2. confirm SPKI CERTFP works (done! confirmed in server-to-server between chat0 and chat1 and with the anarcat account with client-side CERTFP)
3. send announcement of the change to the 23 users (done! a PRIVMSG was sent to each of the 23 users. out of those, 3 were away and 5 were offline. a MEMOSERV message was sent to those 8 users, out of those 2 were refusing messages. a GLOBAL notice was also sent.)
4. upload the 4.1 debian package to unstable (done)
5. fix certbot so it does not regenerate the private key on renewal (done! backport complete)
6. wait two weeks (done!)
7. upgrade certbot to backport everywhere, test a renewal to see if the privkeys are kept (done, it works)
8. switch all servers certfp_method to SPKI, send announcements (done)
9. in the meantime, progressively upgrade all servers to 4.1 while the users rebalance across the cluster (postponed)