Thoughts on Personal Backups

It's important to make good backups, so I've been thinking about how to make it easy to make good backups. I have several home directories in several different heterogeneous environments, including home, laptops, academic computing environments, and a small company.

Attitude

I think that reading Bootstrapping an Infrastructure put me in the right frame of mind to think about making backups easy, automatic, reliable, etc.

In the context of this paper, I am thinking of my home directories as a personal infrastructure built on top of a collection of other computing infrastructures. I want the same properties for my home directories that Bootstrapping wants for professionally administered infrastructures. In particular, I want my home directories to be easily, automatically reconstructed after catastrophes.

My needs for machine-level recovery are more modest than Bootstrapping's, though. I can afford to take a couple of hours to replace a home machine after a failure, because I have several. As long as my data is safe.

That sounds very modest, but it's not. Installing Linux may only take half an hour, but it takes me weeks of calendar time to reinstall all the random packages that I use, and it's easy to forget some security patches and the like. Some kind of automation is in order.

Tools

CVS

I put my home directory under CVS some time ago, and I think it's also good for backups. CVS gives you the freedom to declare large chunks of data expendable, avoiding the need to back it up entirely. It also force you to be precise about what's important enough to keep under version control. CVS also gives you some tools to automate the process of sorting the wheat from the chaff, like .cvsignore files and cvs -n update.

These same comments really apply to any distributed version control system, like maybe PVCS or whatever. CVS is just what I use.

RSync

Once you decide you need to move 1GB worth of old backups from home to work using a 56kb modem connection that's only up maybe 8 hours/day, you need RSync. In particular, rsync --partial is quite handy.

Again, these same comments apply to other serious file distribution programs, like RDist and maybe WGet. RSync has some very useful features, though, and you can't beat the warm, fuzzy feeling of having your files automatically checksummed against the remote copies.

SSH

CVS and RSync run over SSH. Enough said. Check SSH out.

CD-R

For long-term backups, CD-R is the shit. However, because CD-R discs are so small (700MB), I have to think hard about how much data I really need to store. If I could get an entire backup on a single CD, life would be good.

Since I generally get OS distributions on CD for my home machines, and I assume that institutional machines have their own backup infrastructure, I only need to save my customizations to home machines. Rather than backing up my home Unix machines, I'd like to install the OS and run a script, or something like that.

MCrypt

I should be using encryption on my backups, but I'm not thinking about that yet. MCrypt is the obvious choice, but who knows. PGP and GnuPG also come to mind.

-- Peter Szilagyi <szilagyi@alum.mit.edu>