I recently made the jump from raid6 (raid5 with two parity drives instead of one) to
Greyhole (a JBOD pooling application which uses samba as an access point). Why leave the mature bosom of parity striped raid? Well, for my situation, it couldn't have made more sense.
What does Greyhole buy you?
- Instantly add new volumes to your storage pool
- When I wanted to add a drive to my array, it could take 24-40 hours! And that was just the reshape, I then had to expand the file system, etc. It was a big time sync and I'd find myself putting it off as long as possible just because I didn't want to deal with it.
- Recycle Bin
- If you've ever accidentally (or retrospectively regretfully) deleted some data, and lamented the lack of a recycle bin in linux or on mounted network shares, you'll probably appreciate the extra layer of security that Greyhole provides.
- Whenever a delete command is issued via the samba interface, Greyhole takes this file and moves it into its "Attic" where it remains until you either empty the "Attic" or delete the file again from the Recycle Bin share.
- Independently formatted drives
- If one drive dies, only the data that was on that drive is inaccessible
- If you remove a drive from your greyhole pool, it is a completely normal, accessible drive which can be mounted and read from as easily as any other drive
- You can "check out" drives from the storage pool
- You can notify the Greyhole daemon that a drive is going to be missing, and it will wait on recreating file copies (if you're using Greyhole's file persistence)
- Selective data redundancy
- You can setup, by share, how many copies of your files you want persisted.
- So if you say you want two copies of your personal documents, it will make sure every one of your documents has two different copies on two different volumes, so if one drive dies, another will have your data, and should a drive die, it will automatically persist another copy of that file to replace the missing one
- Single storage point
- Just like raid5/6 and LVM your storage point appears to be one big, convenient location, but with a couple perks
- If you're like me and you have your backups separate from Greyhole, you don't need to persist multiple copies of files, and since it's not raid, you don't lose any space to parity. This means you get 100% of your drive space setup in a single storage point!
- LVM can do the above, that is, offer you a single storage point, but it combines your drive into a virtual volume and spreads a single file system across them, and if one drive dies in the LVM volume, your file system is lost, or in a bad way, at the least.
What doesn't it do?
- No performance gains.
- Unlike raid5, you'll get no read boosts since each file exists wholly on an individual volume (and isn't striped across multiple disks) you'll be getting all the bandwidth one drive can offer, not three or more.
- Unlike raid0, you won't be writing to multiple / reading from multiple drives either, so you're going to again be left with the read/write bandwidth of a single drive
- Recreating symlinks / missing file copies from a multicopy missing volume can take time
- Unlike raid1, you don't have an immediate, up to date backup waiting to be swapped into place, if you're using multiple file copies and a volume dies, it might be a little while before the symlinks to those files and the extra copies of those files are restored and accessible, so it isn't good for a situation which calls for data to always be available regardless of the circumstances
- Greyhole does not work with the native system fs
- To capture file operations, Greyhole is completely reliant on Samba, if you access your share outside of Samba, Greyhole has no way of knowing what file operations have taken place
- The work around to the above is to mount your Samba share locally on the linux machine, but it is definitely a limitation
In the end Greyhole is a clever way to simulate a single storage pool by grabbing file operations through Samba and persisting those operations to file copies on a number of a pooled storage volumes. It creates the illusion of this central share point by symlinking to files across the various pool volumes. Despite it being a relatively new program (it's been out for download for around a year now) it's a fairly stable product with little chance of data loss.
Update 7/20/2011: I've been using Greyhole now for close to half a year and I'm more than satisfied with it.
Why I actually went to Greyhole was simply for the flexibility. No more degraded arrays, no more 24+ hour reshaping / recovering. I've lost no data, I've had no issues with data disappearing due to my sata controller resetting ports as it likes to do, and I've added drives to my pool with a quick and casual ease which raid 5 could only dream of. I still manage my backups with my own AnyBackup program separate from Greyhole's built-in file redundancy but that is not due to a lack of functionality on Greyhole's part, just to a different use case on mine.