Monday, February 14, 2011


I recently made the jump from raid6 (raid5 with two parity drives instead of one) to Greyhole (a JBOD pooling application which uses samba as an access point). Why leave the mature bosom of parity striped raid? Well, for my situation, it couldn't have made more sense.

What does Greyhole buy you?
  • Instantly add new volumes to your storage pool
    • When I wanted to add a drive to my array, it could take 24-40 hours! And that was just the reshape, I then had to expand the file system, etc. It was a big time sync and I'd find myself putting it off as long as possible just because I didn't want to deal with it.
  • Recycle Bin
    • If you've ever accidentally (or retrospectively regretfully) deleted some data, and lamented the lack of a recycle bin in linux or on mounted network shares, you'll probably appreciate the extra layer of security that Greyhole provides. 
    • Whenever a delete command is issued via the samba interface, Greyhole takes this file and moves it into its "Attic" where it remains until you either empty the "Attic" or delete the file again from the Recycle Bin share.
  • Independently formatted drives
    • If one drive dies, only the data that was on that drive is inaccessible
    • If you remove a drive from your greyhole pool, it is a completely normal, accessible drive which can be mounted and read from as easily as any other drive
  • You can "check out" drives from the storage pool
    • You can notify the Greyhole daemon that a drive is going to be missing, and it will wait on recreating file copies (if you're using Greyhole's file persistence)
  • Selective data redundancy
    • You can setup, by share, how many copies of your files you want persisted.
      • So if you say you want two copies of your personal documents, it will make sure every one of your documents has two different copies on two different volumes, so if one drive dies, another will have your data, and should a drive die, it will automatically persist another copy of that file to replace the missing one
  • Single storage point
    • Just like raid5/6 and LVM your storage point appears to be one big, convenient location, but with a couple perks
      • If you're like me and you have your backups separate from Greyhole, you don't need to persist multiple copies of files, and since it's not raid, you don't lose any space to parity. This means you get 100% of your drive space setup in a single storage point!
      • LVM can do the above, that is, offer you a single storage point, but it combines your drive into a virtual volume and spreads a single file system across them, and if one drive dies in the LVM volume, your file system is lost, or in a bad way, at the least.
What doesn't it do?
  • No performance gains.
    • Unlike raid5, you'll get no read boosts since each file exists wholly on an individual volume (and isn't striped across multiple disks) you'll be getting all the bandwidth one drive can offer, not three or more.
    • Unlike raid0, you won't be writing to multiple / reading from multiple drives either, so you're going to again be left with the read/write bandwidth of a single drive
  • Recreating symlinks / missing file copies from a multicopy missing volume can take time 
    • Unlike raid1, you don't have an immediate, up to date backup waiting to be swapped into place, if you're using multiple file copies and a volume dies, it might be a little while before the symlinks to those files and the extra copies of those files are restored and accessible, so it isn't good for a situation which calls for data to always be available regardless of the circumstances
  • Greyhole does not work with the native system fs
    • To capture file operations, Greyhole is completely reliant on Samba, if you access your share outside of Samba, Greyhole has no way of knowing what file operations have taken place
    • The work around to the above is to mount your Samba share locally on the linux machine, but it is definitely a limitation
In the end Greyhole is a clever way to simulate a single storage pool by grabbing file operations through Samba and persisting those operations to file copies on a number of a pooled storage volumes. It creates the illusion of this central share point by symlinking to files across the various pool volumes. Despite it being a relatively new program (it's been out for download for around a year now) it's a fairly stable product with little chance of data loss.

Update 7/20/2011: I've been using Greyhole now for close to half a year and I'm more than satisfied with it. Why I actually went to Greyhole was simply for the flexibility. No more degraded arrays, no more 24+ hour reshaping / recovering.  I've lost no data, I've had no issues with data disappearing due to my sata controller resetting ports as it likes to do, and I've added drives to my pool with a quick and casual ease which raid 5 could only dream of. I still manage my backups with my own AnyBackup program separate from Greyhole's built-in file redundancy but that is not due to a lack of functionality on Greyhole's part, just to a different use case on mine.


  1. Any pointers to a good guide on setting up greyhole? The official documentation is near non-existent.

  2. The official documentation is actually pretty good. For installs consult: and for usage:

  3. Hi Andrew

    Apologies for addressing you in this manner, because I'm sure there are more appropriate forums, but your blog is really well written and I am sure that you will be able to help!

    I am struggling to understand what directory structures needs to be set up on each drive which is to be added to a new greyhole storage pool, or does greyhole automatically create these structures?

    1. I haven't looked at the code lately, and Gboudreau has been busy, but I believe he's done away with any setup on the drive end. You add the mount paths to your greyhole config and the greyhole process will create the directories it needs. Follow along with the latest copies of the INSTALL and USAGE files linked above and you should be fine.

  4. I've been using Greyhole for many years now, and I've found I've had very little data loss that couldn't be recovered (but that may have been my own fault). It was easy to set up. I just added my drives to to my server (Amahi) and told it to use the drives as part of the pool. The only note to make is that you shouldn't access the files directly on the server, mount the shares locally. Also, on my v 0.9.2 there is no retention policy on the trash file, today I emptied out 1Tb of trash, despite mounting the trash as a shared folder and deleting all that I thought was not needed. I've also found moving the storage pool to another PC, or upgrading the Operating system simple. You setup the shares exactly as they were on the original PC, then run an FSCK. After a while, all your files appear in the shares (it recreates the symbolic links for each file it finds on pooled drives). Initial copying of files took some time for me, as my landing disk was 120GB, and the server it ATOM based. So I either had to copy in batches, or use a copy program that could limit upload speed, so that greyhole could keep up.