Wednesday, May 25, 2011

AnyBackup Explained

I've been working on AnyBackup for a number of months now. It's gone through some drastic changes in that time, including the language it was written in. I'd like to take a little time now to try to explain exactly what it does and why I think it's useful, or -- at the very least -- why it's useful to me.

The impetus for creating the program was the fact that I have a large amount of data on a single volume (via Greyhole) which I'd like to keep backed up to multiple smaller drives. For a long time I was accomplishing this mostly manually with a little help from some scripts I wrote. There is backup software out there which spans drives, but not exactly in the manner I was looking for. Most backup programs think that disk spanning ends at CD's or DVD's, and I had no intention of feeding several hundred DVD's into my computer once a month, I'd never finish my backups before it was time for the next! There are a few programs out there that allow spanning across hard drives, but they do a few things that break their usefulness to me.
  • They assume all backup drives are connected
  • They back up one large, proprietary file
I wanted a program with flexibility. So what if backup drive four isn't connected? Let me know that and give me a chance to connect it! What if I want to bring my backup drive to a friend's house? I'd like that data to be readily readable! And so my search faltered, and I gave up. A program that suits these needs may be out there, but it eluded me. So it was then that I decided I'd just make my own.

AnyBackup is powerful and simple. The basic idea: backup any number of drives to any number of drives. Or, more specifically, back up all the files which match your criteria ( a list of valid file extensions and a blacklist of regular expressions ) on a given set of content drives to a given set of backup drives.
A high-level, crude chart to visualize what AnyBackup does.

In the chart above you can get an idea of what AnyBackup does visually, take the data from X volumes and back it up to Y volumes. Either side in the above chart could be content / backup, it doesn't matter. What matters is that you have enough room on either side to fit all your files.

What AnyBackup does:
  • Easily lets you back up large volumes (in my case an 11tb Greyhole volume) to several smaller drives
  • Backup groups of drives to groups of drives
  • Blacklist filenames with regular expressions
  • Provide a list of file extensions you want to backup
  • Identifies drives based on volume name and serial number, drive letters can change and it will not effect AnyBackup's record keeping
  • Gives you the flexibility of connecting one backup drives at a time (i.e. using a dock)

What AnyBackup does NOT do:
  • Perform automated backups
    • AnyBackup plays things loose and has no integration with the OS, so it has no way of knowing when things have been updated, so it requires reindexing before backing things up
    • Everything is driven through the GUI, there is no service
    • AnyBackup largely assumes your backup drives won't be connected most of the time
  • Allow multiple backup sets
    • AnyBackup only lets you backup one set of drives to another, whether it's my instance where you have one large volume and a handful of backup drives or if it's several backup drives and several content drives
    • This is an issue for enhancement in the google code tracker, if I get time or someone else is inspired it should be easy enough to setup
  • Backing up only certain directories of a drive
    • Right now you can only backup up whole drives, you can choose what file types get backed up, but that's the extent of it
    • There's a standing enhancement issue to allow addition of directories in drives instead of just drives
    • Update: This is no longer true as of version 0.9!
  • Perform incremental indexing
    • Since AnyBackup does not have any integration with the OS, it has no idea what has happened to your data between runs, so before a backup you'll need to refresh your drives to make sure all new files have been picked up
    • This isn't necessarily a bad thing, even if we had incremental indexing, what if you disconnected a drive and took it somewhere, we'd have no way to know if you put / removed files on / from the drive on another system!
Below is a screenshot of the latest release of AnyBackup (0.8 as of writing this):
AnyBackup 0.8

AnyBackup 0.8 Released


  • Issue 33 : Write to user's home dir to avoid UAC conclicts
  • Issue 35 - Missing drive prompt crashes AnyBackup
  • Issue 34 - Add backup drive lock feature
  • Issue 32 - Use last modified time in hash check
  • Issue 37 : Allow multiple selections in Drive listNewFiles
  • Issue 36: Remote Indexing
  • Issue 39 : Make indexing on drive addition optional
  • Issue 40 : Allow addition of multiple drives at once
  • Issue 41 : Revise GUI to show backup and content drives at the same time
Notable differences in this version:
  • Remote index server script provided to bypass samba for indexing a linux shared directory
    • This GREATLY speeds up indexing of large volumes over samba, testing showed 20-30 minutes of indexing go to ~3 minutes
  • There is no drop down for selecting Backup or Content drives, all drives are now displayed in one list, content drives are listed in bold

Download at

Saturday, May 14, 2011

Regular Expressions Excluding Strings

I ran into a situation recently where it would be very, very handy to be able to write a regular expression which would both look for certain content and exclude others. I admit this is probably not the most efficient way to go about things, but for small and quick use cases I don't see why it shouldn't be used. See below for some explanations!

Negative lookahead:

Let's say you have a string set of strings, 'foobar','barbar','barfoo'. Now let's further speculate that for some unknown, but perfectly valid to you, reason, you want only the strings in the above set which contain a 'bar' but only where 'bar' is not followed by 'foo'. (I'm making this distinction now, this means it's OK to have 'foo' before 'bar', just not after.)

If your regular expression engine supports it, and most do -- at least Perl and Python do, you can write something like this:

  • bar(?!foo)
  • Python:'bar(?!foo)',string)
  • Perl: string =~ /bar(?!foo)/
  • 'foobar' and 'barbar' would match the above regular expression, 'barfoo' would not -- perfect!
Now, as I said, this is for looking ahead, you cannot write something like (?!foo)bar it will not do what you want, as you're attempting to lookbehind. Conveniently, see below for how to do a negative lookbehind.

Below is a Python snippet to really flesh things out:

Negative lookbehind:

We can use the same list as above to demonstrate a lookbehind, but this time let's assume we only want strings which contain 'bar', but only where 'bar' is not preceded by 'foo'.

We can write a regular expression for negative lookbehinds like this:
  • (?<!foo)bar
  • Python:'(?<!foo)bar',string)
  • Perl: string =~ /(?<!foo)bar/
  • 'barfoo' and 'barbar' would match the above regular expression, 'foobar' would not, again, exactly what we set out to do!
Another Python snippet below:


The only thing which makes these lookahead and lookbehinds negative is the exclamation points, you can easily turn this requirement around by removing it, so bar(?foo) would suddenly make the string 'barfoo' the only valid string in our set, pretty intuitive!