Overview of how Tahoe works

Tahoe is known as the “Least Authority File System”, hey that sounds pretty good! What does that mean? Well, Tahoe is essentially a secure, decentralized, fault-tolerant filesystem. This filesystem is encrypted and spread over multiple peers in such a way that it remains available even when some of the peers are unavailable, malfunctioning, or malicious. The one-page summary explains the unique properties of this system. You may also be interested in a PDF which provides a more in-depth description of how the system works.

There is one ‘introducer’ and then there are nodes that contact the introducer to find out about other nodes. The nodes contact the introducer through a secret file URL called the introducer.furl, once your node has that information it will contact the introducer who will then make you part of our private tahoe grid. The introducer is not a storage node, unless you set it up separately to be one.

There are 10 shares generated per file that you upload. Ideally, eachstorage server should get an equal portion of those ten.

Known Limitations

There are currently some known limitations to Tahoe, but the biggest issues have been resolved in version 1.4.

Another interesting thing to note is that any file that was put onto the grid before your node was on the grid, will not be distributed to your node once it comes online. In otherwords, new nodes dont get stuff distributed out to them that already exists on the grid. So we have to make sure that the grid has enough fault-tolerant nodes online before putting files that we want to rely on existing. There should be a way to replicate the file to a node when it comes online. Such things have been discussed a lot, and previous projects that are ancestors of Tahoe have tried to implement it in various ways. Currently the upstream Tahoe developers feel that such things need to be taken care of by the user or by a higher layer of automation than Tahoe itself. There’s no way for Tahoe itself to really know how reliable which servers are, for example. So, assuming that some user or higher layer of automation has chosen that now is a good time to refresh the share distribution of a certain file, there’s an easy way for a user or automation to do that: download the file and reupload it. Also they are working on a more efficient way to accomplish the same thing, which is due to be released in Tahoe 1.3.0 ASAP.

It also is inadvisable to have the node and introducer operating out of the same directory, because the public/private keypair is stored in that directory for each of them, so it would conflict. This only matters if you are running an introducer.

To get started

There’s no need to edit your sources.list since tahoe-lafs is available at debian stable(wheezy). Just

apt-get install tahoe-lafs

Now create the directory that you want to provide storage space to the grid and create the Tahoe client:

dir=$directory
mkdir $dir &&
tahoe create-client $dir

Now edit the .tahoe/tahoe.cfg to put your nickname in the file, also set the advertised_ip_addresses to what you want to use.

Now download the attached introducer.furl and put it in that directory.

NOTE: This introducer.furl is just a test grid, and will likely get destroyed at some point as we figure this out more. There are a couple nodes on this test grid which are only accessible over a riseup VPN link, and at least one public node. It is anticipated that our grid will be completely transported over the intra-collective VPN.

Now start it up:

tahoe start .

If this tahoe node is on your local machine, you can point your webbrowser at the tahoe wui (web UI). If you are running a mozilla-based browser you will get an error:

Port Restricted for Security Reasons

You will need to get around that by editing something in
about:config

Now you should be able to see the current state of the grid, how many storage nodes exist, etc.

Usage

Need to write more, but here is a few things you can experiment with (try creating a file in the wui and in the cli for example)

allmydata.org/source/tahoe/trunk/docs/u...

Application to backups

There is an experimental plugin for duplicity which has been sent on the “tahoe-dev” mailling-list. With this and garbage collection implemented in version 1.4, Tahoe really looks like the proper tool for our backups.