How to synchronize The Library

Background about and general Standard Operating Procedure (SOP) for Librarians's reference on how to sync with Library branches.

This guide is written to help familiarize Librarians with the process of sync’ing their Library branches with each other and, optionally, their personal computers. Synchronization is performed using Rsync, so both the local and remote machines must have the rsync(1) command installed. See the Tech Autonomy Infrastructure committee’s "Rsync" page for details on installing and configuring an rsync server if you are the system administrator for a Library branch.

This page describes basic procedures for:

  • setting up a local copy of your Library branch on your personal computer,
  • downloading new books to your personal copy that may have been added by a fellow Librarian managing the same branch as you, and,
  • uploading new books to the Library branch.

We assume you have a working knowledge of using a command line. If you are not comfortable using the command line interface to your computer, please complete one of the many free instructional courses available online, such as Codecademy’s Learn the Command Line or Taming the Terminal before attempting to perform these procedures.

Background

Since mid-2017, Tech Autonomists in various cities have offered comrades access to numerous shared community resources. One such resource is a massive and growing collection of digitized texts, which we simply call “The Library.” Much like a physical-world Public Library, the Tech Autonomists’ Library is operated by “Librarians” who add, catalogue, and organize the texts therein. In the physical world, Public Libraries must maintain a record of which patrons checked out which books, and stipulate that they be returned within a certain time. This condition is necessitated by the simple fact of the physical world. In a digital context, this condition makes no sense, and so Tech Autonomists do not consider it.

In other words, The Library serves as a geographically disperse network of private e-book distribution points, much like a physical Public Library. However, unlike a physical Public Library, our Library operates without the unnecessary and oppressive hindrance of overbearing oversight, censorship, or physical-world constraints that government- or university-run libraries must contend with. For example, library patrons need not return any items they have checked out, because “checking out” just means downloading a copy of the item, not removing it from circulation; copying is not theft.

Also akin to the Public Library model in the physical world, there are more library patrons than there are librarians. This means that a small number of librarians who coordinate effectively with one another can service orders of magnitude more patrons than their own number. This guide aims to equip you with the basic skills and knowledge you need to know to coordinate effectively with fellow Tech Autonomists who count themselves among the growing number of Librarians who manage one or more branches of this shared Library.

Please refer to the Library page for additional background information.

Introduction

Librarians are responsible for the proper cataloguing, updating, and sharing of Library items between themselves. Your fellow Librarians may be managing the same branch as you are, in which case we call them local Librarians, or they may be managing Library branches many hundreds of miles away from you, in which case we call them remote Librarians. A single Library branch can have any number of local Librarians managing it, however no more than one is needed per branch. Most existing branches today have one or two Librarians each.

A Librarian’s main purpose is to make sure that the contents of their own Library branch are correctly catalogued. This means that items are correctly tagged and sorted, so that they are easily findable. Librarians must use a graphical Free Software tool called Calibre to manage this metadata, along with the digital files that make up the texts (e-books) themselves. As a Librarian, you will need to acquire a copy of this software and familiarize yourself with it so that you can make changes to the Library’s catalog.

Understanding Calibre

Calibre is composed of two important pieces. These are:

  • the Calibre Library filesystem hierarchy, and,
  • the Calibre metadata database.

Understanding the Calibre Library filesystem hierarchy

Calibre functions much “like iTunes, but for books.” When you add a book, a copy of the book’s file (the PDF, EPUB, Mobipocket, etc.) is copied from its original location into a folder on your computer managed by Calibre. This folder is known as your “Calibre Library folder” or “Calibre Library directory.” You can name this folder anything you want, and you can place this folder anywhere you want. In the instructions below, replace Calibre Library with the name you gave to the folder you told Calibre to manage for you.

Since the books in your copy of the Library are all located in a single directory/folder, and this folder is managed by Calibre itself, one useful way to think about this folder is as though it was a single file, rather than a collection of files. You should never need to manually open this folder or move files around in it. In fact, doing so will confuse Calibre and will likely cause headaches for other Librarians. Please do not do this.

Instead of editing or moving book files around yourself, make changes using Calibre’s graphical user interface. (There is also a command line interface to Calibre, useful for automated operations.) This holds true for every editing operation, including minor changes such as fixing typos in author names, titles, publication dates, and so on.

Understanding the Calibre metadata database

The Calibre metadata database is a single file placed into the root of your Calibre Library folder called metadata.db. This file contains numerous indexes of additional information regarding every item Calibre knows about. Book titles, author names, custom tags, human languages, book reviews, and any other data you’d like to associate with a text beyond the text content itself is written to this file.

This arguably makes the metadata.db file the single most important file in your Calibre Library. The file is a SQLite database file, but all you need to know is that every time you update the Library’s catalog, Calibre will write your change into at least this file and, depending on the change you made, possibly others as well. That’s why, at a minimum, each time you synchronize changes from or to another copy of your Library branch, this file will be transferred to the destination or copied to your computer/branch.

Much like the Calibre Library folder, this file should never be edited or opened manually. Let Calibre manage it. It knows what it’s doing. :)

Understanding synchronization

Synchronization is simply a matter of copying the latest changes from one location into another. As a Librarian who can make changes to the Library’s catalog, you need to make sure that you don’t accidentally overwrite another Librarian’s changes when you make yours. This is why synchronizing is important: so that you make yourself aware of any changes someone else may have made between the time you want to make a change (the next intended future edit) and the last time you made any changes (your last edit). In other words, just check to see that no one else has edited the catalog in the intervening period since you last checked in on it.

One potentially helpful mnemonic is to treat the Library’s catalog as though it was a physical notebook placed on a shelf. To make a change, you must:

  1. Take the catalog off the shelf,
  2. edit it, by writing some notes into the catalog, and finally,
  3. return the catalog to the shelf, that is, put it back where you found it.

This is the same procedure as the way you hopefully treat physical objects in your posession, such as when doing dishes at home: you take a bowl from the cabinet, make some changes to it (put food on it), then wash and return it to the cabinet more or less the way you found it. This way, more than one person can use the silverware, the kitchen, or whatever other physical resource is being shared one after the other. As long as you are conscientious about your use of shared resources, such as the Library’s catalog or your dishes, it is relatively straightforward for multiple people to accomplish very complex tasks with those resources.

This conscientiousness and rigorous attention to procedural detail, coupled with a philosophical understanding of its importance, is arguably the single most important trait that Librarians must exemplify throughout their work. As a Librarian, you will be expected to habituate yourself to this level of digital tidiness; it is very obvious to other Librians when you deviate from the synchronization procedure. Don’t worry, you will not be punished in any way. You will simply be expected to do better next time.

If you continue to cause problems for other Librarians over time by failing to follow this procedure, those other Librarians will probably stop paying attention to edits you make, as they should. This means you may be able to receive new items from them, but you will not be able to make changes to their catalogs yourself. In other words, you will need to speak to another human Librarian and request they make the changes for you, which will of course be their choice to make or not as they see fit. Alternatively, you can take your copy of the Library and do with it as you wish, on your own. (I.e., we say you can “go fork yourself.”) ¯\_(ツ)_/¯

Understanding rsync command invocation

To synchronize the catalog and its filesystem, Librarians use rsync. In all cases, the rsync command invocation is almost identical. The command synopsis is:

rsync --compress --recursive --times --delete $LIBRARY_SOURCE/ $LIBRARY_DESTINATION

This command breaks down as follows, but see rsync(1) in the manual for more information, of course:

  • rsync – Invoke rsync.
  • --compress – Compress the data stream. This makes transfers faster over slow network connections at the expense of CPU cycles. It is recommended to always use this, given how powerful personal computers have become.
  • --recursive – Copy the entire directory hierarchy tree, not just a single folder.
  • --times – Preserve modification times and use that filesystem metadata as part of the comparison to ensure only files that have changed are transferred.
  • --delete – Delete any files on the destination not found in the source, so as to remove redundant files, and avoid duplications or orphaned files.
  • $LIBRARY_SOURCE/ – Replace this with the source you wish to copy from; see examples of this in the synchronization commands below. Retain the trailing slash, which tells rsync to copy the contents of the source, not the folder itself. When the source argument is your workstation, you are performing a push. When the source argument is the Library branch itself, you are performing a pull.
  • $LIBRARY_DESTINATION – Replace this with the destination you wish to copy to; see examples of this in the synchronization commands below. When the destination argument is your workstation, you are performing a pull. When the destination argument is the Library branch itself, you are performing a push.

Depending on the way you make your connection, you may also need (or simply desire) to use one or more of the following options:

  • -e – Use the specified program as a remote shell, and execute it as described by the option’s value. For example:
    rsync -e "ssh -i $HOME/.ssh/SSH_IDENTITY_FILE -l SSH_USER_NAME" […]
    
    • "ssh -i $HOME/.ssh/SSHIDENTITYFILE -l SSH_USER_NAME" – Make an SSH connection (ssh) using the identity file (-i) indicated in this path ($HOME/.ssh/SSHIDENTITYFILE), and login (-l) with the remote user name of SSH_USER_NAME.
  • --password-file – Use the contents of the given file as the password for the given rsync user account. For example:
    rsync --password-file /path/to/your/rsync-client.secret rsync://RSYNC_USER_NAME@example.local/MODULE/
    

    The --password-file option is only necessary if all of the following conditions are true:
    • if the rsync server servicing your request demands that you use a password, and
    • if you want to avoid typing your password each time you make such a request, and
    • if you want to avoid exposing your password to other processes capable of inspecting your command invocation.
      Otherwise, you may still want to use this option, but it is not required that you do.

If you are experiencing trouble sending or receiving larger files, you may also find it useful to use:

  • --partial – Saves incomplete files during the synchronization process so that their transfer may be resumed after an interruption or broken connection.

If you wish, you can also include the following options:

  • --progress – Display a progress meter for each transfer.
  • --human-readable – Display numbers in human-readable terms, such as “1K” instead of “1024” (bytes).
  • --verbose – Output more information than the default. This is sometimes helpful while troubleshooting.

Prior to syncing

This guide assumes you (or your system administrator) have already configured a Library branch on a device to which you can connect. There are numerous ways to “connect” to your Library branch. Your Library branch’s system administrator should explain to you the precise mechanism by which you are expected to connect. The rest of this guide describes the most common set ups.

Before you synchronize, however, be certain you are familiar with the Calibre software itself, perhaps by practicing with a small Calibre Library of your own.

Since every synchronization mechanism we discuss uses rsync, remember also that if you are feeling unsure about your commands, you can include the --dry-run option in your rsync command invocation to preview your actions. We recommend you do this each time you are making substantial edits to ensure your command will actually do what you think it will do before you execute it “for real.”

Rsync over Tor

In this configuration, you make an rsync connection to a Tor Onion service. This means you may first need to configure your Tor client. See Connecting to an authenticated Onion service for details on Tor client configuration. You will also need to Torify your rsync invocation.

“Torifying” simply means to proxy your network request through the Tor network, rather than directly to the Internet. There are numerous ways to do this. See the Tor project’s page on torification for more complete details than this section covers.

The easiest way to torify your connection is probably using torsocks(1), which can be acquired by invoking sudo apt update && sudo apt install torsocks when using most Debian-derived Operating Systems. You can also make use of nc(1), which may already be installed by default on your system.

  1. Start with Calibre closed (quit).
  2. Now download (“pull”) a copy of the Library’s most recent changes to your workstation, replacing MODULE with the name of the exported rsync module provided to you by your system administrator:
    # Torification using torsocks:
    torsocks rsync --compress --recursive --times --delete rsync://RSYNC_USER_NAME@abcdef0123456789.onion/MODULE/ "/path/to/Calibre\ Library/"
    # Alternatively, torify using nc:
    #RSYNC_CONNECT_PROG='nc -x 127.0.0.1:9150' rsync --compress --recursive --times --delete rsync://RSYNC_USER_NAME@abcdef0123456789.onion/MODULE/ "/path/to/Calibre\ Library/"
    
  3. At this point, you can open Calibre and make whatever changes you need to.
  4. When you’re done, close (quit) Calibre.
  5. Upload (“push”) a copy of the updated Library and catalog contents back to your Library’s Onion service:
    # Torification using torsocks:
    torsocks rsync --compress --recursive --times --delete "/path/to/Calibre\ Library/" rsync://RSYNC_USER_NAME@abcdef0123456789.onion/MODULE/
    # Alternatively, torify using nc:
    #RSYNC_CONNECT_PROG='nc -x 127.0.0.1:9150' rsync --compress --recursive --times --delete "/path/to/Calibre\ Library/" rsync://RSYNC_USER_NAME@abcdef0123456789.onion
    

Rsync over SSH (optionally over Tor)

In this configuration, you invoke rsync through an ssh connection. The SSH connection may or may not itself connect over Tor. In either case, the rsync command invocation is the same. Replace abcdef0123456789.onion in the commands below with the correct name of the server hosting your Library:

  1. Start with Calibre closed (quit).
  2. Now download (“pull”) a copy of the Library’s most recent changes to your workstation:
    rsync --compress --recursive --times --delete -e "ssh -i $HOME/.ssh/SSH_IDENTITY_FILE" SSH_USER_NAME@abcdef0123456789.onion:"/path/to/Calibre\ Library/" "/path/to/your/copy/of/the/Library/"
    
  3. At this point, you can open Calibre and make whatever changes you need to.
  4. When you’re done, close (quit) Calibre.
  5. Upload (“push”) a copy of the updated Library and catalog contents back to your Library’s rsync endpoint:
    rsync --compress --recursive --times --delete -e "ssh -i $HOME/.ssh/SSH_IDENTITY_FILE" SSH_USER_NAME@abcdef0123456789.onion:"/path/to/Calibre\ Library/" "/path/to/your/copy/of/the/Library/"
    

Rsync over SSH over Tor

In this slightly modified configuration, rsync is used to invoke ssh. The SSH session is then routed through Tor, ultimately connecting to a (possibly authenticated) Onion service. To ensure that this is so, be certain your ~/.ssh/config file contains the following configuration near the top of the file, replacing 9050 with the configuration read from your Tor’s SocksPort directive (described in the Tor manual):

# Make it easier to use Tor Onion services.
Host *.onion
    ProxyCommand nc -x 127.0.0.1:9050 %h %p
    # If you prefer to use socat(1) instead of nc(1), use the following line instead:
    #ProxyCommand socat - SOCKS4A:localhost:%h:%p,socksport=9050

This snippet ensures that when ssh was invoked with a host name (i.e., a server address) ending in .onion, it will connect only to the SOCKS server running on IP address 127.0.0.1 and listening on port 9050, which is the default Tor SOCKS proxy port.

Additional notes

Already made changes in Calibre before you pulled down?

If you know that you have the most up-to-date copy of the Library, but you have already made changes to some new books in Calibre, do not pull before pushing. You will overwrite your work! Basically, you have skipped the “take” step and started instead with “edit,” which means the next step is “return” before anything else.)

Things to notice during synchronization

Firstly, after the message, receiving incremental file list, you should see metadata.db being pulled down (which may come after any deletions performed by --delete).

If you do not see this, you may need to sync down from a clean copy of the Library from another branch.

Also notice how many files are being synced—if you know that not many have been added recently to your local library branch, the sync should be relatively quite quick. If the sync is taking much longer, then you may be accidentally copying all files, in which case you should cancel the procedure and try again.

Make sure that you have included trailing slashes (as in x.local:"Calibre\ Library/") and the complete path to your local library copy.

  1. Perform the “pull” once more, especially if your first pull took more than five minutes. This is to ensure that your copy is properly synced in the case of anyone adding new books in the time it took to perform the first pull.
  2. Once your local copy is updated, then you can choose whether to be finished, or whether to add new books.

Command aliases

If you feel comfortable enough to edit your .bashrc file to make a permanent command alias, you can do so by appending the following to your ~/.bashrc, or one of your other Bash shell startup files. Change ALIASNAME to what you want the alias to be called, along with whatever other parameters are required for successful execution. Remember that after editing your .bashrc file you must close your shell and re-open it, or source the .bashrc file (i.e., invoke source ~/.bashrc) before you can use your new alias.

Command aliases for rsync over Tor

An alias for listing the files contained in a given exported module, or downloading (pulling, copying) those files when a destination is appended during invocation:

alias the-library="torsocks rsync --password-file /PATH/TO/YOUR/RSYNC/PASSWORD/FILE --partial -zvrth --progress --delete rsync://RSYNC_USER_NAME@abcdefg0123456789.onion/MODULE/"

Example uses of the above command alias:

  • Ask rsync to list the contents of the exported module:
    the-library # Invokes the alias verbatim. Without a destination in this configuration, lists files.
    
  • Ask rsync to download (pull) the Library’s contents to the /tmp/test folder:
    the-library /tmp/test # Invokes the alias and then appends a library destination, effectively a pull.
    

Command aliases for rsync over SSH

Things to notice regarding the syntax of the alias command is how carefully one must be to properly escape the single quotes that are necessary to properly pull and push when syncing your library.

An alias for downloading (pulling):

# Pull down from the Library branch.
alias ALIASNAME='rsync -zrvth --progress --delete -e "ssh -i $HOME/.ssh/SSHIDENTITYFILE" USER@LIBRARYNODE.local:'"'"'Calibre\ Library/'"'"' ~/PATH/TO/LIBRARY'

And an alias for uploading (pushing):

# Push up to the Library branch.
alias ALIASNAME='rsync -zrvth --progress --delete -e "ssh -i $HOME/.ssh/SSHIDENTITYFILE" ~/PATH/TO/LIBRARY/ USER@LIBRARYNODE.local:'"'"'Calibre\ Library/'"'"''