Saturday, October 29, 2011

Clustered Handbrake Encoder

This week I've devoted far too much time to hacking out a simple, (fairly) user friendly solution to farming out Handbrake encoding tasks. I've regularly got about three computers that are available to crunch some video (a server and two desktops) so it makes sense to try to utilize all that horse power when I've got a lot of videos to encode.

There are other solutions out there, I found two different python script collections that were said to do the same thing but the interfaces for both sucked and at least one of them had insane requirements like installing a fully functioning messaging system (activemq) to work! Needless to say, I thought the same thing could be accomplished with less requirements and more ease of use. So I've built out my own solution and released it on Google code. It uses Pyro4 for RMI functionality and messaging, and it uses pyftpdlib to move files back and forth over the network. (This may be less efficient than using network shares, yes, but I figured this would make it a little more flexible.)

Note: As mentioned above, this was made in only a week, so please excuse any bugs or rough edges, I'm still working on it! That said, please reply with any feature requests or bugs you find, it would be most appreciated.

To use this script set you only need five things, none of which should be too unreasonable:
  • Python 2.7 (
  • Pyro4 (Can be gotten from pypi)
  • pyftpdlib (Can be gotten from pypi)
  • wxPython (
  • Handbrake CLI 0.9.5 (
Note: You can probably get away with an older version of Handbrake, assuming the CLI interface has not changed significantly. On that note you can probably get away with using older versions of Python and wxPython as well. (At least back to Python 2.6, I'd imagine.)

This has been tested on Windows 7 (64 and 32 bit) and on Fedora Core 14 (32 bit), but I imagine it should work where ever you've got Python 2.7 and wxPython available. (i.e. BSD, MacOS, etc)

Below is a screenshot of the UI in action:
Here is the same thing as below but running on Linux.
You can see from the above that there are three encoders at work, and they'll continue to grab tasks from the central server until all the work is done.

You can find the scripts and a wiki about how to get things running at:

Friday, September 2, 2011

Reconciling File Times Between Unix and Windows

I did some enhancements for AnyBackup not too long ago that required comparison of hash keys generated using (in part) files' last modified time. I discovered an oddity that, despite years of being on the platform, I'd never known about Windows. File meta data has a resolution of 2 seconds. Don't believe me? Take a closer look. What this means is that the modified time (in seconds since the epoch) can never be odd, it's impossible.

It also means that when you copy a file from Linux (which tracks meta times accurately to 1/100 of a second) to Windows, the time is rounded up or down accordingly. The oddness that ensures is that when you look at the Windows file copy it'll (sometimes) show a one second difference as compared to the Linux copy. (It all depends on rounding.)

My gut reaction was to just divide the times by 100 and remove the two least significant digits from play, but that lowers precision and doesn't quite guarantee that you'll avoid the problem entirely. (Imagine your Unix modified time is 1699999999, in Windows this will become 1700000000 -- oh the imprecision!)

When you get the modified time of a Linux file (say through a Samba share) it'll invariably have two digits to the right of the decimal place. (At least when doing so via something like Python, not from a Windows property box.) If you convert it to an inegert to remove these the number will be rounded up or down accordingly. Instead I decided to do something like the following:

  1. Round down (regardless of the two digits to the right of the decimal)
  2. Convert to integer
  3. Check if the number is even (modulus 2)
  4. If it is even, add 1
So going back to our initial example, say your Linux file comes back with a modified time of 1699999999.42:
  1. 1699999999.00 (Round down)
  2. 1699999999 (Convert to int)
  3. Not even (1699999999 % 2 = 1)
  4. 1700000000 (Add one)
  5. Voila, it matches the new Windows copy
(Yes, the conversion to an integer isn't really necessary, but we're dealing with whole numbers already anyway, so why not?)

The above steps ensure that you'll end up with a Windows compatible view of the modified time. So what does this look like in Python code? See below:

 mtime = int(math.floor(os.path.getmtime(fileLocation)))  
 if mtime%2:  
   mtime += 1  

AnyBackup 0.9.3 Released

A hasty follow up to 0.9.2, 0.9.3 comes with some critical bug fixes.

Change list:

  • Issue 49 - Added additional test case for testing the skip list
  • Issue 62 - remote indexing ignoring skip list
  • Issue 63 - Improve remote index property interaction
  • Issue 64 - setName is accessed directly during indexing
  • Issue 65 - Modified rounding time differences
  • Issue 66 - UTF-16 encoded file names
  • Issue 67 - Refreshing multiple drives including remote drives only indexes remote drives if remote indexing is confirmed
  • Fix to reconcile linux's < 1 second file time resolution and windows's 2 second time resolution ( i.e. modified times in windows can only move in deltas of 2 seconds )

Tuesday, August 30, 2011

AnyBackup 0.9.2 Released

Changes in 0.9.2:

  • Issue 56 - Deal with duplicates
  • Issue 60 - Improve unsaved exit dialog
  • Issue 59 - Fix backup button in toolbar
  • Issue 58 - validate regular expressions in skip list
  • Issue 57 - Enhance hash key creation
  • Issue 55 - Decouple GUI and operations
  • Issue 49 - Add automated test cases
  • Issue 26 - Skip list to handle directories
  • Issue 61 - Backup files to a specified directory on backup volumes.
  • Bug fix for duplicate generation and switch to generate md5 hash keys instead of strings for hashify and reduce hash functions
Note: If you're currently using 0.9.1 or below, you'll need to follow the wiki for an extra step to ensure that your upgrade to 0.9.2 goes smoothly! You can find the wiki page here.

Wednesday, July 13, 2011

Greyhole Vs Raid 5/6

If you're here you're probably wondering about Greyhole. It lets you span drives, but why is it any better/worse/different than Raid5/6? Well, it's different for several reasons. It's a fundamentally different approach to pooling drives. Let's get visual for a moment.

A rough diagram of how Greyhole works. (As of 0.9.9)

An over-simplification of a raid configuration
The above two diagrams (excuse me glossing over details -- we're here to get theoretical, not technical) show a simple overview of Greyhole and Raid. Can you spot the differences? The most important distinction: Pooling is done above the filesystem level in Greyhole. Okay, so what does this mean? Well, in a raid system, your storage space is pooled before you create a file system. This means the filesystem can only function when it has all the drives together.

What difference does this distinction make?

  • Individual drives from a raid 5/6 array are not readable on other machines
    • The drives only have part of a filesystem
  • Since the Greyhole pooling is done above the filesystem level, the individual drives are readable on other machines
    • If you were to take any Greyhole pool drive from your server and hook it up to another pc, all the data that lives on that drive is right there and easily accessible.
  • Since Greyhole is really just creating a logical mapping between symlinks and files it has moved to pool drives, you can add new pool volumes instantly
  • On a raid volume you'd need to reshape your entire array (for large arrays this can take > 24 hours) and then you'd need to expand the file system to take up the extra space that now exists in your logical raid volume
    • The same thing holds for removing drives, however, this will take significantly longer for Greyhole than adding since it must migrate all data onto other volumes
  • Unlike raid, since Greyhole is simply flipping files on to different pool volumes, it can use different sized drives
    • Got a 100gb, a 1tb, and a 3tb drive? No problem
    • Note: That in the above example you would not be able to successfully create  2x file copy redundancy if all the drives were full (think about it, 1.9tb of files would have no other drives to make copies on than the 3tb)
  • Raid requires volumes of the same size (unless you use something like lvm to combine smaller partitions to the size of other volumes) to run, it has to calculate parity across all the data and to do that in a consistent way the amount of bytes, etc, must be the same.
    • You could partition larger drives out, but if you put two partitions from the same drive in an array you've just completely negated your fault tolerance, if that larger drive died raid 5 would be shot and raid 6 would be at the end of its fault tolerance
Fault Tolerance
  • Greyhole and Raid 5/6 each have their pros and cons for parity / redundancy
  • Raid 5/6 can handle one / two drive failures and still keep your data intact, and it can do this while only sacrificing one / two drives to parity out of total number of drives you're using
    • This is efficient, but it can also take a long time for large volumes to repair once new drive(s) are added in to replace the failed ones
    • Once you step beyond one / two failures all the data is dead and gone completely
      • Since your data is spanned across all your volumes, the likelihood of any data being wholly sound while the array is completely degraded is unlikely and once that degraded array goes down, it won't be coming back up
  • Greyhole lets you set X file copies per Samba share, so you can set it to two and Greyhole will create two copies of every file you transfer to it
    • This is, of course, less efficient, instead of using 1/x (or 2/x) space for parity, you're now just using x space, or a one to one backup
    • You can potentially lose data with just two failures
      • If you have data on drive 1 and Greyhole creates a backup copy on drive 2 and then both those drives fail, you've lost said data
        • Note: This assumes you have 2x file copies set, if you set 3x file copies it would take 3 failures to lose data -- but then you would need 3x the hard drive space for 1x data
        • See the chart at the end of this section for a visual explanation of how two failures could result in data loss even when you're creating 2x file copies
      • In the above situation all data would be safe in raid 6, but would be gone in raid 5
  • Surviving drives are not impacted by failures in Greyhole
    • If you have five drives in a Greyhole pool and drive three dies, the other four drives are fine and still accessible
*Note: If you diligently keep backups, fault tolerance is not the biggest concern in the world.
Notice that Your_Picture.jpg is on Drive 1 and Drive 2, even though you've got redundancy, if both those drives died, Your_Picture.jpg is forever lost :(
* Note that the above configuration is illustrating a share with 2 file copies and the 'most_available_space' dir selection algorithm 

  • As of right now Greyhole relies on Samba to capture file system events
    • This means all file operations must occur through Samba or Greyhole will not know about them
    • If you want to manipulate the Greyhole volume locally on your server you must locally mount the volume with cifs
    • Note: There's an open ticket on the Greyhole github about using an alternative mechanism for logging file system events which would decouple Greyhole from Samba, but this has not been done as of 0.9.9
  • Since raid 5/6 is done below the filesystem level, this is not an issue and you can modify locally, through Samba, etc
  • Greyhole will give you little if any read performance boosts
    • It's possible you can end up reading multiple files from different drives at the same time and get a speed boost this way, but that's about it
  • Raid 5 / 6 stripe data across all your drives at a low level
    • When you read a file there's a high likely hood that the file actually spans across many or all the drives in your raid volume
      • This means if you read a file the system doesn't just have to wait on the current read to go through, it can go on to the next drive and wait on it as well, and the next, etc, giving you a good boost in read speeds
  • Greyhole writes are no slower than writing directly to a drive*
    • *Once the file is written to the landing zone Greyhole will have to copy it again to one of your pool drives, so in reality it takes something like twice the amount of time to write a file, plus any overhead for creating meta data -- though this is largely transparent to the end user!
  • Raid 5 / 6 has a good amount of overhead associated with writing data since it must create parity for the data, which requires complex calculations and read operations across all the disks
So what does all this mean, really? Well, Greyhole is generally a far more flexible framework for pooling. It is limited though, in its dependence on Samba. It also won't give you any significant performance boosts. It does allow you to instantly grow your pool. Raid 5 / 6 is more efficient, but has a far more rigid structure and in extreme failure scenarios results in total data loss. In cases of minor failure each system has its merits depending on your viewpoint.

The bottom line for me, when dealing with a large volume, raid 6 was a hassle. Reshaping or recovering the array took about 40 hours, even over eSata with a reasonably fast dual core processor. With Greyhole things are far more flexible. If my sata controller decides to spontaneously reset a port, I do not have to worry about an array falling over and the ensuing force reassembly. In fact, when the drives have reset, Greyhole hasn't even blinked. For a home user scenario, if you're willing to deal with 2x the space usage*, (more akin to Raid 10) then Greyhole is a clear winner for it's flexibility. If you're willing to put up with raid's rigidity and you cannot abide the space required for one to one redundancy in Greyhole then raid 5 or 6 is by no means a bad choice. Like most things it comes down to your situation and preferences, but I hope this has given you a basis for making an informed decision between the two.

* You do not have to use 2x storage space. If you are confident in your backup strategy you could set 1 file copy and use all your storage space for files and no redundancy. In this case any failure will result in data loss -- which isn't a big deal if your backups are up to date.



  1. Support for varied volume size
  2. Flexible architecture -- easily and quickly add / remove drives
    1. Any individual drive can be read from other machines
  3. Better worst-case fault tolerance
  4. Only provides for one to one backups (2x the space)
  5. No performance gains
  6. Coupled to Samba
    1. Requires locally mounting through Samba to change storage pool files on the server
  1. Generally provides a read performance boost
  2. More efficient fault tolerance
    1. Raid 5 requires 1 drive for parity no matter how many total and raid 6 requires 2 drives for parity no matter how many total
    2. In raid 5 any single drive failure can be tolerated and the array will rebuild once you replace the failed volume, raid 6 can handle any two drive failures
  3. Is like any other volume and does not require samba or any other interface for interaction with array files
  4. Rigid requirements
    1. Drives are only readable when all are together in an assembled array
    2. Individual drives cannot be read on other machines
    3. Requires identically sized volumes
  5. Rebuild / reshaping times for large volumes can be slow
  6. If you surpass the fault tolerance (2 or 3 failures depending on your raid level) your data is completely gone.

Friday, July 1, 2011

AnyBackup On The Web

All I can say is wow apparently July 1st rolled around and people have suddenly heard of AnyBackup -- realistically a small group of people, but people none the less. 0.9.1 has gotten more downloads in the last day than all the previous versions combined. I've been making and using the application for about half a year now, during which time the whole I've continued to improve and polish the program. It continues to make my life easy and I certainly hope it is helping others.

All that said, it seems like AnyBackup is suddenly open to a much wider user base, so I would not be surprised if people begin discovering new bugs. If you're using AnyBackup and you run across bugs, please, let me know what they are! You can raise issues at and I'll do my best to address them in a timely fashion.

For anyone who is a new user and is confused about how AnyBackup works, please read the wiki. (Be especially careful when selecting a backup drive, it will delete any files on the backup drives which are not on your content drives!)

AnyBackup on the web:

Wednesday, June 29, 2011

Programmatically Scrolling a wxListBox

Something that I stumbled on recently that I could not for the life of me find an answer to. It seems so simple. You have a listbox and you want to programmatically scroll it. Why would you want to do this? Well, maybe you need to refresh the listbox, that was my case. In most cases you have to clear the list and repopulate it. Which is fine but it'll plop your user right back to y=0, which may be fine for small lists, but for large lists that can be a pain! Also you might want to persist a listbox position between sessions.

There are functions such as EnsureVisible which will scroll to a specific item, which might work for some use cases, but for mine I wanted to refresh the ListBox and in that case whatever item I chose may very well be gone once the listbox is repopulated. Aside from that, there's no handy way, that I could find, to figure out which items are currently in the view! Scrolling via item specification is a pretty half-baked way to achieve the over-all goal of automatically scrolling the listbox.

The first function I came across is the aptly named GetScrollPos, it takes an argument which specifies the orientation of the position you want. (wx.VERTICAL or wx.HORIZONTAL) It allows you to get the vertical position quite easily. Halfway there, right? Well... not quite. See, there is also a 'handy' function called SetScrollPos. Sounds like a match made in heaven, no? No. You see, SetScrollPos sets the scrollbar's position, but it does not effect the underlying window or widget. So even though your vertical scrollbar is now scrolled to position Y you'll notice that your list is still showing starting at item 0... not terribly helpful. I googled and trawled forums and scanned the api documentation and could not find a clean or obvious approach.

There is a method listbox inherits called ScrollLines. It does exactly what you'd think it does based on the name. You pass it a number (negative or positive) and it will scroll X lines up or down (based on if the number is negative or positive respectively). Sounds promising! But there is no function to get the line you're scrolled to! And my hopes were dashed again.

Then I got desperate. I thought, 'What if I take the vertical output from GetScrollPos and feed it into ScrollLines?' Immediately I answered myself, 'Probably a great big, inconsistently-scrolling mess!' But I tried it anyway. And praise be to the wx.Gods, it worked! Now, I've only tested this one in Windows 7, I cannot attest to it working on any other Windows platform, let alone Mac or *nix/BSD.

Enough yammering, let's see some code! The below is from my media player project. self.seriesList is a wx.ListBox:

    def refreshList(self,evt=None):
        pos = self.seriesList.GetScrollPos(wx.VERTICAL)
        for show in reversed(sorted(, key=lambda x: x.getName())):
            self.seriesList.Insert(show.getName()+' (%i/%i)'%(show.getWatchedEpisodeCount(),show.getEpisodeCount()),0,show)

I hope this was helpful to someone! I couldn't find this anywhere.

Update 7/26/2011 -- This same method works for wx.ListCtrl as well.

wxPython AuiManager

I recently switched AnyBackup to use wx.AUI for pane management instead of just using plain old panels and sizers. First off, let me just say that SplitterWindows can go jump in a lake. They are painful to tweak, the end result isn't all that pretty, etc. Using the AuiManager, on the other hand, is very pleasant once your get your head around a few things!

A few benefits:
  • Prettier
  • Dockable, floatable, maximizable, closeable panels
  • Dead easy layout management
  • Did I mention it's pretty?
There are a few concepts you need to understand for AuiManager layout management

  • Direction (Left,Right,Center,Top,Bottom)
    • If you've ever used the BorderLayout in Java with Swing this shouldn't be too hard to understand
    • Each position represents a part of your frame, the top will add an item to the top, bottom to bottom, etc
    • The code for the below test application can be found here:
  • Position
    • Position lets you place multiple items in a single area
    • If you're using left,center, or right position will stack items vertically
    • For top or bottom position will stack items horizontally
    • The code for the below test application can be found here:
  • Row
    • Like positions, rows also let you stack multiple items in one area
    • Rows behave opposite positions, in left, center, and right items stack horizontally, etc
    • The code for the below test application can be found here:

  • Layer
    • Notice in the above examples that when the left label is given a higher layer it takes up a global left position instead of a local left position, this is what I meant by higher layers 'trumping' lower ones

For those of you who haven't guessed yet, let me put this right out there for you, you can combine layers, positions, rows, and directions any which way you please. What does this mean? It means you can easily organize your content pretty much anyway you can think to mix and match these various control features.

Consider the below example:
We've created three sets of rows with two positions so we can stack both horizontally and vertically in one area. You can combine most any of these features. Experiment! Get a feel for how the various properties combine, it's the best way to learn. Code for the above example can be found here. I hope this example helps!

Removing And/Or Replacing a Greyhole Drive

I see this question a lot, how do you replace or remove a Greyhole pool drive? First off, yes, there is an easy and correct way to do this. Second it is not to simply remove the drive from greyhole.conf and restart Greyhole, this will not do what you want, but the correct process is easy enough.

So let's say that something terrible happens and one of your drives begins to give the click of death. It still works, but there's no saying for how long. (You should have backup's, shame on you if you don't!) Or perhaps it's something far more mundane, you're running low on space and you want to replace a smaller drive with a larger one. In either case, the below will work.

  • Add your new drive to the greyhole.conf file
    • You don't have to hook up a new drive, if your remaining volumes have enough space to absorb the file copies stored on the drive that's going away, feel free to skip this section
  • Make sure you follow all the steps to get your new drive Greyhole ready (mount it, create the gh folder, create the .greyhole_uses_this file)
  • Restart Greyhole to make sure your updated config has been picked up
Now come's the fun part! We're going to use a handy little command called --going (-n). Basically this option lets you tell Greyhole 'Hey, this drive has valid file copies, but it's going to go away soon, so don't count these file copies towards the total.' Okay, maybe that's a mouthful, how about 'Hey, Greyhole! Copy all the files on this volume to other volumes!' While overly simplified, that gives a clearer impression of what's going on.
  • Run `greyhole --going=/path/to/drive/that/is/being/removed`
    • Where /path/to/drive/that/is/being/removed matches the path for the volume that is listed in your greyhole.conf
    • Once you run this command Greyhole will automatically remove this drive from your greyhole.conf
  • Once this is run Greyhole will schedule an fsck which should proceed shortly if not immediately
  • This can take a while, if you watch the Greyhole log you should see it running through all your files and creating new file copies for any files that are on the drive which has been marked as going
  • After this is done you can unmount the drive and remove it as you wish
    • For sanity I'd unmount the drive and then verify I can still access files that were on the removed drive. If yes, then you can be fairly certain that Greyhole correctly migrated all the data
    • Once you've sanity tested, physical removal of the drive shouldn't be a problem

Tuesday, June 28, 2011

wxPython ListBox PopupMenu

I've been working on my TV Emulator project in Python for a few months now. For one of my windows I use a listbox and I wanted to create a popup menu. I've created plenty of popup menus with TreeCtrl's. It's easy enough, we've got a handy right click event which you can grab the position from. For listbox? No such luck.

If you google this issue you're likely to see many people say "Use a listctrl!" That's a perfectly valid answer, creating a popup menu is easy with a listctrl, but a listctrl is a more complex object and might be overkill for your needs. There's got to be a way to use a popup menu on a listbox, right? Right! Let's get to it.

So, we can use the wx.ContextMenuEvent to fire a function when you right click the listbox, this event can be bound to a listbox with wx.EVT_CONTEXT_MENU, below is a test application I put together to show what happens when you try to grab the position from a ContextMenuEvent. (Hint: It doesn't work! See the screenshot below.)
#! /usr/bin/python

import wx
import wx.lib.agw.aui as aui

class myFrame(wx.Frame):
    def __init__(self,parent,ID,title,position,size):
        self.listbox = wx.ListBox(choices=[],id=-1,parent=self,style=wx.LB_EXTENDED)
        self.mgr = aui.AuiManager(self)
        self.mgr.AddPane(self.listbox, aui.AuiPaneInfo().Center())
        for i in xrange(10):
    def createMenu(self): = wx.Menu()
        item1 =,'Item 1')
        item2 =,'Item 2')
    def showPopupMenu(self,evt):
        position = evt.GetPosition()

class WXApp(wx.App):
    def OnInit(self):
        frame = myFrame(None,-1,'Test App',wx.DefaultPosition,wx.Size(680,550))
        return True

def main():
    wxobj = WXApp(False)


Notice the popup menu shows up far off the point where the mouse clicked!
So obviously this doesn't work correctly as is, right? Well, there's a way around this. We can keep the contextmenu event, or we can use the right button up event, either works. The point is we need an event that fires when the right mouse button is clicked on the listbox. The problem is that we can't depend on the event for popup menu positioning because it isn't giving a position relative to your frame. But this information is accessible from wx. See below for a modified showPopupMenu function. (All other code is the same.)
    def showPopupMenu(self,evt):
        position = self.ScreenToClient(wx.GetMousePosition())
This looks better! The popup menu now appears where the mouse is located at click time.
With the above modification we're now grabbing the mouse's position directly from wx and then getting a position that's relative to the frame. With this position we can now accurately show a popup menu! I hope this helps someone.

Sunday, June 19, 2011

Media Player Project Update

Back in April I made a post about creating some kind of a 'TV Emulator'. Well, I've not been idle on this. I've been busily figuring out how VLC's http interface works and all the various VLC command line arguments (there are a lot of them!). It's gotten to the point where it's a reliable little app. It weighs in at just over 1200 lines of code, you have to love Python for it's concision. The program offers the following features to date.

As a refresher the core goal of the project idea was: pick a random TV series from those available and play the first (from a chronological perspective) unwatched episode and persist this watched/unwatched information
  • Add directories to scan on refreshes
    • Directories are assumed to be have the following structure:
      • drive:\path\to\dir\<show name>\<any or no season organization -- this is optional>\*S##E##*.<valid video extension>
        • Or for your *nix people: /path/to/dir/<show name>/<any or no season organization -- this is optional>*S##E##*.<valid video extension>
      • Where S##E## is the season number and episode number -- all my ripped box sets have been ripped and named to this format precisely so I can make easy assumptions like this
      • If you're brave you could just update the threadedIndex function to use assumptions that are valid for your organization system, but obviously I cannot promise that won't break anything else (but hey, it's Python, it shouldn't be too hard, right?).
  • Look for new content on your added drives on demand
  • Track what episodes have been watched
  • Mass unwatch / watch episodes in a series
  • Blacklist series (so they won't show up on subsequent refreshes)
  • Create 'channels'
    • To start with you have an irremovable channel called 'Default' (very imaginative, I know) which contains all your shows, you can add new channels by using the bottom channel bar and hitting 'Add'
      • You can give your added channel a custom name and add your desired series to it
      • A series can belong to multiple channels
      • If you watch an episode in one channel it will be reflected as watched in all channels
  • Save your current episode and position in the episode per channel
    • If you change channels your episode / position in the first channel is saved
    • If you later go back to that first channel (assuming you didn't watch the episode you were on already) it will resume the episode from where you left off -- basically every channel persists its state
  • Skip to the next show without marking the current episode as watched, want to watch this episode just not right now? The next button is for you :)
  • Mark an episode as watched and skip to the next (the big eye button does this) if you just watched this episode the other day (say on actual television) you can just skip it and mark it as watched, no harm no foul
  • Automatically start VLC with the proper http interface enabled
    • This app requires an http interface be active since this is what it uses to communicate with VLC, so if it finds that VLC is not running, or at least no instance with http active, it will try to find VLC and then run it with --extraintf oldhttp (and it will tell vlc to never repair avi indexes -- since this can cause a bad workflow loop)
  • Automatically mark an episode as watched once you get through 90+% of it
And that's pretty much the extent of what the app does. It's filled with a few assumptions about the way I have things organized to make things simpler -- but I can do that since my girlfriend and I are the primary users. Below is a screen shot of the culmination of my efforts. It's nothing too terribly advanced but right now it does exactly what I set out to do and I'm pretty happy with it. I haven't released it anywhere yet -- I'm not even certain if anyone would be interested in it, but if anyone is I'd be happy to share.

TV Emu 0.4 for VLC
Please note, I own the Reno 911 box set s and ripped them to digital format for my own htpc use . (Read: Don't sue me.) 

Sunday, June 12, 2011

AnyBackup 0.9 Released

I've released AnyBackup 0.9 today. The GUI has been overhauled, a few key features have been added, and many pain points have been sped up significantly.

The major changes are that I've switched the GUI up to use AUI, it's a lot prettier and easier to get around in. I've tweaked remote indexing and switched to Pyro for sending remote python objects. This cut the remote indexing time in half more or less. I've also added the ability to select which directories you want to backup up from content drives.

Yes, I'm aware the screenshot below says 0.8, I forgot to update this before building the version and it's such a minor issue I saw no reason to rebuild/upload for it.

Download at:


  • Issue 43 - Update GUI to use aui
  • Issue 44 - Search result file click broken
  • Issue 45 - Add status text to splash screen
  • Issue 46 - File view area not showing whole directory path
  • Issue 47 - Switch remote indexing to use Pyro
  • Issue 48 - Avoid using deepcopy in threaded actions
  • Issue 50 - File icon type not always correctly displayed.
  • Issue 51 - Remote index function not updating drive space information
  • Issue 25 - Allow addition of folders only
AnyBackup 0.9

Wednesday, May 25, 2011

AnyBackup Explained

I've been working on AnyBackup for a number of months now. It's gone through some drastic changes in that time, including the language it was written in. I'd like to take a little time now to try to explain exactly what it does and why I think it's useful, or -- at the very least -- why it's useful to me.

The impetus for creating the program was the fact that I have a large amount of data on a single volume (via Greyhole) which I'd like to keep backed up to multiple smaller drives. For a long time I was accomplishing this mostly manually with a little help from some scripts I wrote. There is backup software out there which spans drives, but not exactly in the manner I was looking for. Most backup programs think that disk spanning ends at CD's or DVD's, and I had no intention of feeding several hundred DVD's into my computer once a month, I'd never finish my backups before it was time for the next! There are a few programs out there that allow spanning across hard drives, but they do a few things that break their usefulness to me.
  • They assume all backup drives are connected
  • They back up one large, proprietary file
I wanted a program with flexibility. So what if backup drive four isn't connected? Let me know that and give me a chance to connect it! What if I want to bring my backup drive to a friend's house? I'd like that data to be readily readable! And so my search faltered, and I gave up. A program that suits these needs may be out there, but it eluded me. So it was then that I decided I'd just make my own.

AnyBackup is powerful and simple. The basic idea: backup any number of drives to any number of drives. Or, more specifically, back up all the files which match your criteria ( a list of valid file extensions and a blacklist of regular expressions ) on a given set of content drives to a given set of backup drives.
A high-level, crude chart to visualize what AnyBackup does.

In the chart above you can get an idea of what AnyBackup does visually, take the data from X volumes and back it up to Y volumes. Either side in the above chart could be content / backup, it doesn't matter. What matters is that you have enough room on either side to fit all your files.

What AnyBackup does:
  • Easily lets you back up large volumes (in my case an 11tb Greyhole volume) to several smaller drives
  • Backup groups of drives to groups of drives
  • Blacklist filenames with regular expressions
  • Provide a list of file extensions you want to backup
  • Identifies drives based on volume name and serial number, drive letters can change and it will not effect AnyBackup's record keeping
  • Gives you the flexibility of connecting one backup drives at a time (i.e. using a dock)

What AnyBackup does NOT do:
  • Perform automated backups
    • AnyBackup plays things loose and has no integration with the OS, so it has no way of knowing when things have been updated, so it requires reindexing before backing things up
    • Everything is driven through the GUI, there is no service
    • AnyBackup largely assumes your backup drives won't be connected most of the time
  • Allow multiple backup sets
    • AnyBackup only lets you backup one set of drives to another, whether it's my instance where you have one large volume and a handful of backup drives or if it's several backup drives and several content drives
    • This is an issue for enhancement in the google code tracker, if I get time or someone else is inspired it should be easy enough to setup
  • Backing up only certain directories of a drive
    • Right now you can only backup up whole drives, you can choose what file types get backed up, but that's the extent of it
    • There's a standing enhancement issue to allow addition of directories in drives instead of just drives
    • Update: This is no longer true as of version 0.9!
  • Perform incremental indexing
    • Since AnyBackup does not have any integration with the OS, it has no idea what has happened to your data between runs, so before a backup you'll need to refresh your drives to make sure all new files have been picked up
    • This isn't necessarily a bad thing, even if we had incremental indexing, what if you disconnected a drive and took it somewhere, we'd have no way to know if you put / removed files on / from the drive on another system!
Below is a screenshot of the latest release of AnyBackup (0.8 as of writing this):
AnyBackup 0.8

AnyBackup 0.8 Released


  • Issue 33 : Write to user's home dir to avoid UAC conclicts
  • Issue 35 - Missing drive prompt crashes AnyBackup
  • Issue 34 - Add backup drive lock feature
  • Issue 32 - Use last modified time in hash check
  • Issue 37 : Allow multiple selections in Drive listNewFiles
  • Issue 36: Remote Indexing
  • Issue 39 : Make indexing on drive addition optional
  • Issue 40 : Allow addition of multiple drives at once
  • Issue 41 : Revise GUI to show backup and content drives at the same time
Notable differences in this version:
  • Remote index server script provided to bypass samba for indexing a linux shared directory
    • This GREATLY speeds up indexing of large volumes over samba, testing showed 20-30 minutes of indexing go to ~3 minutes
  • There is no drop down for selecting Backup or Content drives, all drives are now displayed in one list, content drives are listed in bold

Download at

Saturday, May 14, 2011

Regular Expressions Excluding Strings

I ran into a situation recently where it would be very, very handy to be able to write a regular expression which would both look for certain content and exclude others. I admit this is probably not the most efficient way to go about things, but for small and quick use cases I don't see why it shouldn't be used. See below for some explanations!

Negative lookahead:

Let's say you have a string set of strings, 'foobar','barbar','barfoo'. Now let's further speculate that for some unknown, but perfectly valid to you, reason, you want only the strings in the above set which contain a 'bar' but only where 'bar' is not followed by 'foo'. (I'm making this distinction now, this means it's OK to have 'foo' before 'bar', just not after.)

If your regular expression engine supports it, and most do -- at least Perl and Python do, you can write something like this:

  • bar(?!foo)
  • Python:'bar(?!foo)',string)
  • Perl: string =~ /bar(?!foo)/
  • 'foobar' and 'barbar' would match the above regular expression, 'barfoo' would not -- perfect!
Now, as I said, this is for looking ahead, you cannot write something like (?!foo)bar it will not do what you want, as you're attempting to lookbehind. Conveniently, see below for how to do a negative lookbehind.

Below is a Python snippet to really flesh things out:

Negative lookbehind:

We can use the same list as above to demonstrate a lookbehind, but this time let's assume we only want strings which contain 'bar', but only where 'bar' is not preceded by 'foo'.

We can write a regular expression for negative lookbehinds like this:
  • (?<!foo)bar
  • Python:'(?<!foo)bar',string)
  • Perl: string =~ /(?<!foo)bar/
  • 'barfoo' and 'barbar' would match the above regular expression, 'foobar' would not, again, exactly what we set out to do!
Another Python snippet below:


The only thing which makes these lookahead and lookbehinds negative is the exclamation points, you can easily turn this requirement around by removing it, so bar(?foo) would suddenly make the string 'barfoo' the only valid string in our set, pretty intuitive!

Sunday, April 17, 2011

AnyBackup 0.7.1 Released


  • Issue 23 : Display drive name next to letter in the add dialog
  • Issue 28 : 0.6 broke paginated results
  • Issue 30 : Display file count / name during indexing
  • Issue 31 : Allow deletion of multiple items at once for valid extension / skip list

Thursday, April 14, 2011

Restoring Deleted Files in Greyhole And Terminology Explained

Greyhole has a lot of interesting terms that might not offer an immediate explanation as to what they actually represent. I also see a lot of people asking how they can restore deleted files in Greyhole. Well, let's get to it!

Update 7/20/2011: I submitted a change to my forked Greyhole github which gboudreau merged into the main Greyhole git repo. This change simplifies all the terminology, so I've updated the below guide to show the new terms along side their old world counterparts. These new terms will be live in 1.0.0! Everything that looks like (This) is referring to Greyhole 1.0.0+.

First let's get a list of terms together.

  • Tombstone (Metadata File)
  • Attic (Trash)
  • Graveyard (Metadata Store)
None of these make much sense right away (well, they do if you understand the thought process behind them, but that can take time!) So let's go through and analyze each item. I'll put them through the layman's translator for you!

Tombstone (Metadata File)
Tombstone (Metadata File) -- "A file containing meta data about a file in your Greyhole pool."

  • Tombstones (Metadata files) are automatically created and stored for every file that is written to your Greyhole pool
  • Tombstones (Metadata files) are stored in a collection called a graveyard (metadata store) (we'll get to this later)
  • Every drive in your pool has it's own collection of Tombstones (Metadata Files)
  • Tombstones (Metadata Files) mirror the structure of your share
    • You have a file in your share stored at /path/to/sharename/folder/file
    • Let's say Greyhole moves this file to drive sdb1 which is mounted at /mnt/hdd1
    • The Tombstone  (Metadata File) for file will be created at /mnt/hdd1/gh/.gh_graveyard/sharename/folder/file (mnt/hdd1/gh/.gh_metastore/sharename/folder/file)
  • If you have a share set to save multiple copies of a file, there will be a Tombstone  (Metadata File) created on each drive that contains a copy
  • If you have only one copy of files per share you will actually have two Tombstones  (Metadata Files).
    • One will be one the drive that contains the file
    • The other will be in in a backup graveyard (metadata store) -- this is so you know what files have gone missing if a drive dies!
Graveyard (Metadata Store)
Graveyard  (Metadata Store) -- "A storage pool drive's collection of tombstones (metadata files)"

  • Every drive in your Greyhole pool has a Graveyard  (Metadata Store).
  • The Graveyard's (Metadata Store's) location is /path/to/pool/drive/gh/.gh_graveyard (/path/to/pool/drive/gh/.gh_metastore)
  • The directory structure inside .gh_graveyard (.gh_metastore) mirrors that of your share, the only difference being that the files it contains are not your files, but rather meta data (Tombstones)  (Metadata Files) about them, you'll notice that they are small and contain just a little bit of text (see the above definition for more about Tombstones  (Metadata Files))
    • There may also be a .gh_graveyard_backup (.gh_metastore_backup) folder on pool drives which contain Tombstones (Metadata Files) for files on other shares when the files copies for a share is only one
Attic (Trash)
Attic (Trash) -- "Greyhole's recycling bin"

  • Whenever Greyhole get's into a situation where it would delete a file, Greyhole moves the file into the Attic (Trash) instead.
    • If you do a delete, Greyhole moves the file to the Attic (Trash).
      • Note: If you have a program that creates temporary files when opening a file (like word or vim, etc) and then deletes those temporary files you'll end up with files in your Attic (Trash) that you don't necessarily recognize. (See below for how to access files in your Attic (Trash).)
    • If you have >1 copies of files per share and you write to a file the out of date copies (those that weren't modified) are sent to the attic (trash).
  • Each drive has it's own Attic (Trash) folder.
    • The Attic (Trash)  folder is at /path/to/pool/drive/gh/.gh_attic/ (/path/to/pool/drive/gh/.gh_trash)
      • The folder structure for an Attic (Trash), like a Graveyard (Metadata Store), mirrors that of your share, but, unlike a Graveyard (Metadata Store), the files inside an Attic (Trash) are real files.
  • To get to the files in the Attic (Trash) you can either browse to the path above for each of your Greyhole drives or you can setup a Greyhole Recycle Bin Share
    • You can create a special share name with one of the following names in Samba: 'Greyhole Attic', 'Greyhole Trash', 'Greyhole Recycle Bin'
    • Create the above share like you would any other Greyhole share (that is, use the vfs object and dfree properties)
    • When Greyhole sees this in your Samba config it will create symlinks to all files deleted after the share is created -- older files in the Attic (Trash)  must be accessed via the paths above -- in the Attics (Trashes) in the share path you specify.
      • This won't take effect until after the Greyhole service has been restarted, so remember to do this after making changes to your Samba or Greyhole configs!
    • From this share you can copy your deleted files back to the pool or delete them.
      • Files deleted from the Attic / Recycle Bin share are deleted permanently.
  • Having deleted files move to the Attic (Trash) is the default behavior. If you do not want this to happen you can change the delete_moves_to_attic (delete_moves_to_trash) property in greyhole.conf (either globally or per share)
    • If you set this property to "no" Greyhole will permanently delete all files, they will not be moved to the Attic (Trash) ever.

Sunday, April 10, 2011

Media Player Project

I've got a project I've got in beta right now that I might eventually release. Here's the lowdown.

Concept: Broadcast network emulation.

What does this mean? Basically, a random distribution of tv series you have ripped from dvd to your htpc. Before you say you can just shuffle a playlist, read on. If you shuffle a playlist, you do get a random distribution of shows, this is true, but the chronological order of those shows is not preserved. (i.e. you get Season 5 Episode 2 of show X and then a few positions down the playlist you're suddenly watching Season 1). For some shows this doesn't matter, especially if there's no overarching story lines, but for others it can lead to a very disjointed viewing experience.

The ideal solution for me here is to have a player that:

  1. Chooses a random television series
  2. Finds the lowest unwatched episode for that series
  3. Plays the episode
  4. Marks the episode as watched in a persistent store once finished
I also want this to be light weight, so I've decided to forgo writing something like this for XBMC.

Instead I've wrapped this around VLC, more specifically, VLC's http interface. (I may switch to using VLC's python bindings for an internal controller later for a more all-in-one experience -- it depends on the momentum for the project.)

The project as it stands will do the steps outlined above and a little more. It makes web calls to an open VLC player (with the http interface enabled) in the background and will constantly play new episodes while the player is active and record each episode's status as it goes.

It's very basic right now, you can add shows, remove them, and play. You can't 'unwatch' shows, etc yet. But I plan to build this out. I may even add media scrapers and turn this into a slick interface for VLC in general. Time will tell. I'm not sure if anyone else has this same desire, I could be alone. And if so, I'll happily keep this to myself. :)

Saturday, April 9, 2011

AnyBackup 0.6 Released

I've released AnyBackup 0.6 today!


  • Resize-able elements
  • File sizes are now displayed (in MB) alongside the file names in the browser and result tables
  • Result page selector is only cleared at the appropriate times
  • Sticky backup mode keeps track of pending write directories to cluster files appropriately
  • Windows exe version now comes in an easy to use installer package
For the few people who've discovered it, enjoy! I've been using it regularly for my own backups and it accurately backs up my greyhole pool.

I've had discussions with a friend about how a branch would be approached for linux, it sounds doable, but right now I don't have the interest since my primary concern is Windows where I use the application. If anyone would like to take a crack at it I can explain my ideas. I certainly wouldn't mind any additional python developers helping maintain / improve AnyBackup.

AnyBackup 0.6
Download at

Sunday, March 6, 2011

AnyBackup 0.5

I put up version 0.5 of AnyBackup on the google code page today.

Change Log:
  • Issue 19: AnyBackup throws exception on trying to view empty drives.
  • Issue 8: Update restore function to use pending write feature 
  • Bugfix: Directories were indexed with their own name in their path variable, this has been fixed.
  • Bugfix: Directories should come before files in search results. 
  • Cleanup: Removed unnecessary free space checks during backups / restores since the pending write feature ensures we won't use a drive with inadequate space.
  • Issue 11: Paginated Results 
  • Issue 17: AnyBackup keeps track of empty folders 
  • Bugfix: Folders were not included in search results, they are now.
  • Issue 1: Create AnyBackup Property File 
  • Issue 2: Add option to use balanced backup volume selection
  • Issue 15: AnyBackup doesn't report failure upon attempting to add an unnamed drive.
  • Issue 14: Backup Sort Bug -- this is now resolved by initializing the to backup array at the same time as the old files array.
Not a lot has visually changed about the program aside from the new result page selector, nonetheless here is a screenshot of the updated AnyBackup:

Please note that the above mkv files are rips from DVD box sets I own. :)

    Tuesday, February 22, 2011

    AnyBackup Beta 2 -- Now with 100% more Python!

    Following up on Beta 1 I've now finished Beta 2 of AnyBackup! What's different? Well, I ported the entire project to Python instead of Perl. Not because I think that there's any great functional benefit to one over the other, but rather because it's something I needed to pick up for work anyway. The main benefit to this release is that pretty much all the standing issues have been resolved. See below. I'll be looking into setting up a google code page in the near future for AnyBackup since it seems like a much cleaner approach than just updating my blog + mediafire account all the time.

    Note: I work on this program in my spare time primarily to solve my own backup needs, I release it for others to use since I figure others may have similar backup needs that AnyBackup can fulfill. That said, this is beta software (and even if it weren't) I can make no guarantees that it will always 100% back up all your data and that no data loss will occur. There can be bugs that I don't know about / haven't hit. So basically, buyer beware, use at your own risk, I can't be held responsible for any issues that arise. If you do hit issues, don't hesitate to report them!

    • Now written in Python instead of Perl (see explanation above)
    • File comparisons now made against file sizes (in KB) and directory paths in addition to file names
      • See issues below for more details
    • File object revised to store directory root and directory path separately
      • This allowed me to get rid of some ugly regular expression hacks going on in the backup and indexing methods
    • When backing up files or restoring files AnyBackup now creates a pending size change (the total of file size deletes and additions) and makes sure that any additional adds will fit, this keeps things from running into an issue where a drive will run out of space and copies fail
    • Icon added to show whether a drive is currently connected or not (+ and green for connected, x and red for disconnected)
    • Drive background color in the drive list now changes colors based on free space (>15% green, >10% yellow, <10% red)
    • Drive free/total space added to drive list (in GB)
    Known Issues:

    • If a backup volume is full but there is new content to backup that prefers this full volume (due to the above backup logic based on parent folders), it will still choose this full volume as the destination and then throw an error that there isn't enough space on the destination volume.
      • Follow up -- my solution for now is preferring a drive based on the above logic, but it creates a pending write total and checks to see if all the files we want to add will fit, if not it will instead grab the most free drive and back the files up there, in the next release I'll add a property file which will allow you to turn this sticky files feature on or off. (For now it's on -- in the next release you can turn it off and it will always place files on the most free drive at the time of choosing)
    • For some reason when running a new backup, AnyBackup will often leave a few old files (that is, files that are no longer found on your content drives -- meaning they've been deleted / renamed / written to) -- this normally has no negative repercussions, but I'll figure out what's causing this eventually. 
    • Some people are having issues launching the exe packaged version of AnyBackup, I'm not sure what's causing the issue, I've been using Cava Packager to create the exe's + installers and it runs perfectly on my test machines. I'll put up an archive of the raw perl later for those that have issues -- the downside to using the raw perl is that you'll need to have a valid perl install with all the packages I use.
      • Since this is now in python being packaged with py2exe, I'm assuming this will no longer be an issue.
    • File comparisons are made only against file names, so if you change/write to a file or have duplicate file names in multiple directories this can lead to inconsistencies such as only backing up one file instead of multiple or not backing up the updated version of a file
      • I augmented this naive comparison, it now checks the file sizes and directory names in addition to file names. The only possible complication I see at this point is that the file size comparison is done in KB and not bytes, I'm not sure if this will cause issues (I'm thinking not) Feel free to let me know if this impacts small files with minute changes, though! (This is the only situation I really see it being an issue.) If I get reports I'll look into changing this.
    • Minor issue with the most free drive logic, it needs to be updated to incorporate the new pendingwrite total otherwise it may not be grabbing the drive with the most free space, but rather than drive with the most free space before any operations have taken place (i.e. if we have 10gb pending to a drive which has 8gb more free than any other volume, it should no longer be considered the most free drive, but right now, it will be) 
    • The CLI interface for the app, isn't guaranteed to work yet and all features are definitely not built out (primarily restore definitely does not work, I backup will not remove old files from backup volumes). What does work I haven't tested, so buyer beware when using it in this release. In the next release I will finish building out the features for it and also package it as an exe alongside the GUI interface.
     I'm only going to add one new screen shot to show off the (very small) UI changes to the main window. Despite the complete port of the app to Python, it's still using wxWidgets which is pretty much identical across languages.

    (As before, this code is released under the GPL license!)
    Update: Google Code project created! You can find it here.

      Thursday, February 17, 2011

      Dynamically Convert A Raid5 Array to Raid6

      Transferred from my old blog:
      If you have the newest mdadm tool, version 3.1.1 and above, it is now capable of changing the raid level of an array. If you cannot find a copy of the latest version for your distro you can compile from source via mdadm’s git repo, git://
      The below assumes that you have 1 spare drive ready to add to your array (/dev/sdb) and that /dev/md0 is the raid5 array that you would like to move to raid6 and that you have 4 raid devices in /dev/md0 to start with.
      Use the below commands:
      mdadm --add /dev/md0 /dev/sdb
      mdadm --grow --level=6 --raid-devices=5
      Once this completes, you should have a fully functioning raid6 array. Enjoy your dual parity.
      Further, you can also change the chunk size dynamically while you’re at it, the default chunk size of mdadm (which I believe they plan to up in future versions) is a paltry 64k, you’d be much better off with something in the 256-512k range. To change the chunk size of an array, use the following:
      mdadm --grow /dev/md0 --chunk=512
      I’ve seen several references now using –chunks-size, so it’s possible in future versions this may be the correct flag instead of –chunk, just something to be aware of. Also, upping your chunk size to 512 may not be possible depending what the total size of your array is. It’s possible that mdadm will spit out an error stating that the total array size is not divisible by 512, in which case you’ll have to settle for something smaller. (i.e. try 256 or 128).