Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Saturday, October 29, 2011

Clustered Handbrake Encoder

This week I've devoted far too much time to hacking out a simple, (fairly) user friendly solution to farming out Handbrake encoding tasks. I've regularly got about three computers that are available to crunch some video (a server and two desktops) so it makes sense to try to utilize all that horse power when I've got a lot of videos to encode.

There are other solutions out there, I found two different python script collections that were said to do the same thing but the interfaces for both sucked and at least one of them had insane requirements like installing a fully functioning messaging system (activemq) to work! Needless to say, I thought the same thing could be accomplished with less requirements and more ease of use. So I've built out my own solution and released it on Google code. It uses Pyro4 for RMI functionality and messaging, and it uses pyftpdlib to move files back and forth over the network. (This may be less efficient than using network shares, yes, but I figured this would make it a little more flexible.)

Note: As mentioned above, this was made in only a week, so please excuse any bugs or rough edges, I'm still working on it! That said, please reply with any feature requests or bugs you find, it would be most appreciated.

To use this script set you only need five things, none of which should be too unreasonable:
  • Python 2.7 (python.org)
  • Pyro4 (Can be gotten from pypi)
  • pyftpdlib (Can be gotten from pypi)
  • wxPython (wxpython.org)
  • Handbrake CLI 0.9.5 (handbrake.fr)
Note: You can probably get away with an older version of Handbrake, assuming the CLI interface has not changed significantly. On that note you can probably get away with using older versions of Python and wxPython as well. (At least back to Python 2.6, I'd imagine.)

This has been tested on Windows 7 (64 and 32 bit) and on Fedora Core 14 (32 bit), but I imagine it should work where ever you've got Python 2.7 and wxPython available. (i.e. BSD, MacOS, etc)

Below is a screenshot of the UI in action:
Here is the same thing as below but running on Linux.
You can see from the above that there are three encoders at work, and they'll continue to grab tasks from the central server until all the work is done.


You can find the scripts and a wiki about how to get things running at: http://code.google.com/p/clustered-handbrake/




Friday, September 2, 2011

Reconciling File Times Between Unix and Windows

I did some enhancements for AnyBackup not too long ago that required comparison of hash keys generated using (in part) files' last modified time. I discovered an oddity that, despite years of being on the platform, I'd never known about Windows. File meta data has a resolution of 2 seconds. Don't believe me? Take a closer look. What this means is that the modified time (in seconds since the epoch) can never be odd, it's impossible.

It also means that when you copy a file from Linux (which tracks meta times accurately to 1/100 of a second) to Windows, the time is rounded up or down accordingly. The oddness that ensures is that when you look at the Windows file copy it'll (sometimes) show a one second difference as compared to the Linux copy. (It all depends on rounding.)

My gut reaction was to just divide the times by 100 and remove the two least significant digits from play, but that lowers precision and doesn't quite guarantee that you'll avoid the problem entirely. (Imagine your Unix modified time is 1699999999, in Windows this will become 1700000000 -- oh the imprecision!)

When you get the modified time of a Linux file (say through a Samba share) it'll invariably have two digits to the right of the decimal place. (At least when doing so via something like Python, not from a Windows property box.) If you convert it to an inegert to remove these the number will be rounded up or down accordingly. Instead I decided to do something like the following:

  1. Round down (regardless of the two digits to the right of the decimal)
  2. Convert to integer
  3. Check if the number is even (modulus 2)
  4. If it is even, add 1
So going back to our initial example, say your Linux file comes back with a modified time of 1699999999.42:
  1. 1699999999.00 (Round down)
  2. 1699999999 (Convert to int)
  3. Not even (1699999999 % 2 = 1)
  4. 1700000000 (Add one)
  5. Voila, it matches the new Windows copy
(Yes, the conversion to an integer isn't really necessary, but we're dealing with whole numbers already anyway, so why not?)

The above steps ensure that you'll end up with a Windows compatible view of the modified time. So what does this look like in Python code? See below:


 mtime = int(math.floor(os.path.getmtime(fileLocation)))  
 if mtime%2:  
   mtime += 1  

Sunday, June 12, 2011

AnyBackup 0.9 Released

I've released AnyBackup 0.9 today. The GUI has been overhauled, a few key features have been added, and many pain points have been sped up significantly.


The major changes are that I've switched the GUI up to use AUI, it's a lot prettier and easier to get around in. I've tweaked remote indexing and switched to Pyro for sending remote python objects. This cut the remote indexing time in half more or less. I've also added the ability to select which directories you want to backup up from content drives.

Yes, I'm aware the screenshot below says 0.8, I forgot to update this before building the version and it's such a minor issue I saw no reason to rebuild/upload for it.


Download at: http://code.google.com/p/anybackup/downloads/list


Changes:

  • Issue 43 - Update GUI to use aui
  • Issue 44 - Search result file click broken
  • Issue 45 - Add status text to splash screen
  • Issue 46 - File view area not showing whole directory path
  • Issue 47 - Switch remote indexing to use Pyro
  • Issue 48 - Avoid using deepcopy in threaded actions
  • Issue 50 - File icon type not always correctly displayed.
  • Issue 51 - Remote index function not updating drive space information
  • Issue 25 - Allow addition of folders only
AnyBackup 0.9

Saturday, May 14, 2011

Regular Expressions Excluding Strings

I ran into a situation recently where it would be very, very handy to be able to write a regular expression which would both look for certain content and exclude others. I admit this is probably not the most efficient way to go about things, but for small and quick use cases I don't see why it shouldn't be used. See below for some explanations!

Negative lookahead:


Let's say you have a string set of strings, 'foobar','barbar','barfoo'. Now let's further speculate that for some unknown, but perfectly valid to you, reason, you want only the strings in the above set which contain a 'bar' but only where 'bar' is not followed by 'foo'. (I'm making this distinction now, this means it's OK to have 'foo' before 'bar', just not after.)

If your regular expression engine supports it, and most do -- at least Perl and Python do, you can write something like this:

  • bar(?!foo)
  • Python: re.search('bar(?!foo)',string)
  • Perl: string =~ /bar(?!foo)/
  • 'foobar' and 'barbar' would match the above regular expression, 'barfoo' would not -- perfect!
Now, as I said, this is for looking ahead, you cannot write something like (?!foo)bar it will not do what you want, as you're attempting to lookbehind. Conveniently, see below for how to do a negative lookbehind.

Below is a Python snippet to really flesh things out:


Negative lookbehind:

We can use the same list as above to demonstrate a lookbehind, but this time let's assume we only want strings which contain 'bar', but only where 'bar' is not preceded by 'foo'.

We can write a regular expression for negative lookbehinds like this:
  • (?<!foo)bar
  • Python: re.search('(?<!foo)bar',string)
  • Perl: string =~ /(?<!foo)bar/
  • 'barfoo' and 'barbar' would match the above regular expression, 'foobar' would not, again, exactly what we set out to do!
Another Python snippet below:

Note:

The only thing which makes these lookahead and lookbehinds negative is the exclamation points, you can easily turn this requirement around by removing it, so bar(?foo) would suddenly make the string 'barfoo' the only valid string in our set, pretty intuitive!

Followers