Saturday, February 4, 2012

Peer to Peer Lending

Prologue:


In a totally different direction from my normal posting which revolves around technical / programming questions I'm going to spend some time tracking my venture into peer to peer lending. I'm a twenty something professional and I'm attempting to find intelligent places to invest money. I've got a 401k, I have some money in a brokerage account invested in equities, and I've got some in cash in savings, etc. Next I've begun playing in the peer to peer lending space.

What Is P2P Lending?

This isn't quite so scary or random as it sounds. I'm not going around soliciting random people asking if they'd like to borrow money from me, rather I'm making use of third party services. There are two main players in the United States: Prosper and Lending Club. They've had very different histories, but both have been in the news sporadically in the last few years, especially while banks have been hesitant to lend to any but the most credit worthy individuals. (Specifically I've seen many different articles about small businesses turning to this type of funding when they need capital.)

The idea is pretty simple, find people who want to invest money and connect them with people who need to borrow. Of course there is a lot more to it than that. Due diligence is necessary. Are the borrowers who they say they are? Do they make as much money as they say they do? What kind of interest rate should they pay based on their credit history, income, and assets? These are all things companies like Lending Club and Prosper take care of for you. From an investment perspective all you need to do is decide what your risk appetite is. They both have various ranking systems (which I won't go into detail on -- but which the sites describe in depth) which range from people with excellent credit, low risk and returns, to people who have a more spotted past and who pay a much higher interest rate.

What's The Catch?

But wait, this is scary, right? What if the person stops paying? Default is always a risk, your money is not 100% guaranteed, but in an environment where you're lucky to see 3/4 of a percent APR from a 'high-yield' savings account, it takes some risk to get a return. This is not a savings account. That said, it is less scary than it sounds. While you can purchase the entirety of a loan, it is highly discouraged. Rather you can buy as little as $25 of any given loan. So if you take $2,500 to start with, you can buy $25 chunks of 100 different loans. Then if one person defaults, you've lost only the remainder of interest+principal on that note.

All this said, realize that P2P lending is largely an illiquid investment. Think of it as something more akin to a bond or a certificate of deposit. They can both be liquidated readily enough, but you'll most likely take a hit when doing so. There is a trading platform for notes (both services have a platform but they do not trade with each other, only users of each site may trade with other members). Every month you'll get a payment on a note which will include some money that goes towards the note's principal and some which goes towards interest. The principal is simply you getting your base back, the interest is your profit here. (Don't get too excited when you see that you're getting $100+ / month on your $5k investment.) You can, of course, choose to take the interest back out as profit, or you can reinvest it and allow your money to go back to work for you.

What I'm Doing:

I've decided to try out both sites and see which one I prefer. I've allocated $5,000 to each as a test investment and I intend to reinvest my profits. I may also allocate additional funds later if the downside risk is truly as minimal as the statistics make it appear. After a month and half on prosper and half a month on lending club, I can say they each have their own benefits.

Prosper:

  • Allows automatic investments based on your filters.
    • This is really nice as it means that as soon as you get at least $25 in payments it'll push those into new notes and maximize the compounding effect of reinvestment.
  • Very broad / flexible filters
    • Lending Club has filters as well, but they only seem to allow you to make very broad restrictions
  • Higher interest rates
    • Prosper tends to have a higher range of risk, that is they allow people with lower credit scores than Lending Club, and at most levels just seem to charge a slightly higher rate of interest than Lending Club
  • Takes their money by playing the borrowing/investing spread
    • They advertise 8% to a borrower, for example, and 7% to you, and then they take 1% of the resulting interest payments from the borrower
    • Lending Club takes 1% of your payments directly so it is possible, if unlikely, that you could loose money, if a person takes out a loan and pays it back in full, in a month, you could end up with less principal than you started, this couldn't happen with Prosper
Lending Club:
  • Volume
    • Lending club flat out has more loans than Prosper, roughly twice as many loans which means there's a lot more loans out there for you to invest in
    • More loans means you can get your money invested faster and it's easier to narrow down your filters without fear of getting too specific and ending up with a pool of loans that's too small
  • Less risk
    • Lending Club has higher credit standards than Prosper and the majority of their large pool of loans are A or B rated, the returns are lower, of course, but this can be a good thing depending on your risk appetite.
Full Disclosure:
  • Prosper: First invested 12/21/2011
  • Lending Club: First invested 1/9/2012
  • $5,000 to each account
    • ($0.17 more in LC due to their bank account verification in which they withdraw a small amount)
  • In both accounts I've gone for a mix of high interest loans (I've outright excluded most AA rated loans on Prosper and A rated loans on LC to boost my return rates)
That said, it's February 4th, 2012. About a little more (and less) than a month in. Let's review!


Prosper:
Payments received:$146.74
Principal paid off:-$87.42
Payments in excess of principal:=$59.32
Principal charge-offs:-$0.00
Gain/loss to date:=$59.32
Principal value of active notes:$4,987.58
Total active notes:190  
   Current:188  
   Past due (1-30 days):1  
   Past due (31+ days):0  
   Payoff in progress:
Total charged-off notes:0  
Total notes paid in full:0  
Total notes sold:0  

Account value as of writing: $5,062.82

Lending Club:
Deposits:$5,000.17  
Investment:
(includes In Funding)
( $5,000.00 )
Principal Received:$0.00  
Note Interest:$0.00  
Late Fees Received:$0.00  
Recoveries:$0.00  
Collection Fees:$0.00  
Service Charges:( $0.00 )
Adjustments:$0.00  
Withdrawals:( $0.00 )
Pending Withdrawals:( $0.00 )
Referral Bonus:$0.00  





Account value as of writing: $5,015.54*

*Lending club counts 'accrued interest' as part of your account value, the amount of interest you are owed up to a given point of time and does not actually reflect any guaranteed earnings -- the cash value of my account is still $5000.17 as my investments are still too new to have received payments yet.

Conclusion:


I can't draw any meaningful results from my Lending Club account yet as it's obviously two or three weeks newer than my Prosper account and isn't even fully invested yet (about 8% of it is still pending). But I'm positive on the results of my Prosper account. I have one note which is <15 days late. It is possible it will go into default or that it will be brought back to current, but even if it goes into default, the gains from the rest of my loans already more than cover it. I'll check in again next month and see where things are at. So far I think it's a great alternative to leaving money in Savings. (I do not need it for an emergency fund as I have other liquid assets available elsewhere -- obviously this is not a good alternative to an emergency fund as it is not liquid at par.)

Before you start!


If you've found this looking to try out a peer to peer lending service, be aware there are some great tools at your disposal. All loan information from Prosper and Lending Club is accessible. People have used this to create some awesome pages and tools that allow you to create filters and then benchmark them against historical loans so you can see how well, or poorly, a given subset of loans performs. One such web page is http://lendstats.com/ I highly encourage anyone looking to get involved in P2P lending to play around with this page before investing. Also, remember to diversify! This can be said of any investment, but why buy one $2500 loan when you can buy a hundred $25 loans?

See the second update.See the third update.

Saturday, October 29, 2011

Clustered Handbrake Encoder

This week I've devoted far too much time to hacking out a simple, (fairly) user friendly solution to farming out Handbrake encoding tasks. I've regularly got about three computers that are available to crunch some video (a server and two desktops) so it makes sense to try to utilize all that horse power when I've got a lot of videos to encode.

There are other solutions out there, I found two different python script collections that were said to do the same thing but the interfaces for both sucked and at least one of them had insane requirements like installing a fully functioning messaging system (activemq) to work! Needless to say, I thought the same thing could be accomplished with less requirements and more ease of use. So I've built out my own solution and released it on Google code. It uses Pyro4 for RMI functionality and messaging, and it uses pyftpdlib to move files back and forth over the network. (This may be less efficient than using network shares, yes, but I figured this would make it a little more flexible.)

Note: As mentioned above, this was made in only a week, so please excuse any bugs or rough edges, I'm still working on it! That said, please reply with any feature requests or bugs you find, it would be most appreciated.

To use this script set you only need five things, none of which should be too unreasonable:
  • Python 2.7 (python.org)
  • Pyro4 (Can be gotten from pypi)
  • pyftpdlib (Can be gotten from pypi)
  • wxPython (wxpython.org)
  • Handbrake CLI 0.9.5 (handbrake.fr)
Note: You can probably get away with an older version of Handbrake, assuming the CLI interface has not changed significantly. On that note you can probably get away with using older versions of Python and wxPython as well. (At least back to Python 2.6, I'd imagine.)

This has been tested on Windows 7 (64 and 32 bit) and on Fedora Core 14 (32 bit), but I imagine it should work where ever you've got Python 2.7 and wxPython available. (i.e. BSD, MacOS, etc)

Below is a screenshot of the UI in action:
Here is the same thing as below but running on Linux.
You can see from the above that there are three encoders at work, and they'll continue to grab tasks from the central server until all the work is done.


You can find the scripts and a wiki about how to get things running at: http://code.google.com/p/clustered-handbrake/




Friday, September 2, 2011

Reconciling File Times Between Unix and Windows

I did some enhancements for AnyBackup not too long ago that required comparison of hash keys generated using (in part) files' last modified time. I discovered an oddity that, despite years of being on the platform, I'd never known about Windows. File meta data has a resolution of 2 seconds. Don't believe me? Take a closer look. What this means is that the modified time (in seconds since the epoch) can never be odd, it's impossible.

It also means that when you copy a file from Linux (which tracks meta times accurately to 1/100 of a second) to Windows, the time is rounded up or down accordingly. The oddness that ensures is that when you look at the Windows file copy it'll (sometimes) show a one second difference as compared to the Linux copy. (It all depends on rounding.)

My gut reaction was to just divide the times by 100 and remove the two least significant digits from play, but that lowers precision and doesn't quite guarantee that you'll avoid the problem entirely. (Imagine your Unix modified time is 1699999999, in Windows this will become 1700000000 -- oh the imprecision!)

When you get the modified time of a Linux file (say through a Samba share) it'll invariably have two digits to the right of the decimal place. (At least when doing so via something like Python, not from a Windows property box.) If you convert it to an inegert to remove these the number will be rounded up or down accordingly. Instead I decided to do something like the following:

  1. Round down (regardless of the two digits to the right of the decimal)
  2. Convert to integer
  3. Check if the number is even (modulus 2)
  4. If it is even, add 1
So going back to our initial example, say your Linux file comes back with a modified time of 1699999999.42:
  1. 1699999999.00 (Round down)
  2. 1699999999 (Convert to int)
  3. Not even (1699999999 % 2 = 1)
  4. 1700000000 (Add one)
  5. Voila, it matches the new Windows copy
(Yes, the conversion to an integer isn't really necessary, but we're dealing with whole numbers already anyway, so why not?)

The above steps ensure that you'll end up with a Windows compatible view of the modified time. So what does this look like in Python code? See below:


 mtime = int(math.floor(os.path.getmtime(fileLocation)))  
 if mtime%2:  
   mtime += 1  

AnyBackup 0.9.3 Released

A hasty follow up to 0.9.2, 0.9.3 comes with some critical bug fixes.

Change list:

  • Issue 49 - Added additional test case for testing the skip list
  • Issue 62 - remote indexing ignoring skip list
  • Issue 63 - Improve remote index property interaction
  • Issue 64 - setName is accessed directly during indexing
  • Issue 65 - Modified rounding time differences
  • Issue 66 - UTF-16 encoded file names
  • Issue 67 - Refreshing multiple drives including remote drives only indexes remote drives if remote indexing is confirmed
  • Fix to reconcile linux's < 1 second file time resolution and windows's 2 second time resolution ( i.e. modified times in windows can only move in deltas of 2 seconds )

Tuesday, August 30, 2011

AnyBackup 0.9.2 Released

Changes in 0.9.2:

  • Issue 56 - Deal with duplicates
  • Issue 60 - Improve unsaved exit dialog
  • Issue 59 - Fix backup button in toolbar
  • Issue 58 - validate regular expressions in skip list
  • Issue 57 - Enhance hash key creation
  • Issue 55 - Decouple GUI and operations
  • Issue 49 - Add automated test cases
  • Issue 26 - Skip list to handle directories
  • Issue 61 - Backup files to a specified directory on backup volumes.
  • Bug fix for duplicate generation and switch to generate md5 hash keys instead of strings for hashify and reduce hash functions
Note: If you're currently using 0.9.1 or below, you'll need to follow the wiki for an extra step to ensure that your upgrade to 0.9.2 goes smoothly! You can find the wiki page here.

Wednesday, July 13, 2011

Greyhole Vs Raid 5/6

If you're here you're probably wondering about Greyhole. It lets you span drives, but why is it any better/worse/different than Raid5/6? Well, it's different for several reasons. It's a fundamentally different approach to pooling drives. Let's get visual for a moment.

A rough diagram of how Greyhole works. (As of 0.9.9)

An over-simplification of a raid configuration
The above two diagrams (excuse me glossing over details -- we're here to get theoretical, not technical) show a simple overview of Greyhole and Raid. Can you spot the differences? The most important distinction: Pooling is done above the filesystem level in Greyhole. Okay, so what does this mean? Well, in a raid system, your storage space is pooled before you create a file system. This means the filesystem can only function when it has all the drives together.

What difference does this distinction make?

Modularity
  • Individual drives from a raid 5/6 array are not readable on other machines
    • The drives only have part of a filesystem
  • Since the Greyhole pooling is done above the filesystem level, the individual drives are readable on other machines
    • If you were to take any Greyhole pool drive from your server and hook it up to another pc, all the data that lives on that drive is right there and easily accessible.
Flexibility
  • Since Greyhole is really just creating a logical mapping between symlinks and files it has moved to pool drives, you can add new pool volumes instantly
  • On a raid volume you'd need to reshape your entire array (for large arrays this can take > 24 hours) and then you'd need to expand the file system to take up the extra space that now exists in your logical raid volume
    • The same thing holds for removing drives, however, this will take significantly longer for Greyhole than adding since it must migrate all data onto other volumes
  • Unlike raid, since Greyhole is simply flipping files on to different pool volumes, it can use different sized drives
    • Got a 100gb, a 1tb, and a 3tb drive? No problem
    • Note: That in the above example you would not be able to successfully create  2x file copy redundancy if all the drives were full (think about it, 1.9tb of files would have no other drives to make copies on than the 3tb)
  • Raid requires volumes of the same size (unless you use something like lvm to combine smaller partitions to the size of other volumes) to run, it has to calculate parity across all the data and to do that in a consistent way the amount of bytes, etc, must be the same.
    • You could partition larger drives out, but if you put two partitions from the same drive in an array you've just completely negated your fault tolerance, if that larger drive died raid 5 would be shot and raid 6 would be at the end of its fault tolerance
Fault Tolerance
  • Greyhole and Raid 5/6 each have their pros and cons for parity / redundancy
  • Raid 5/6 can handle one / two drive failures and still keep your data intact, and it can do this while only sacrificing one / two drives to parity out of total number of drives you're using
    • This is efficient, but it can also take a long time for large volumes to repair once new drive(s) are added in to replace the failed ones
    • Once you step beyond one / two failures all the data is dead and gone completely
      • Since your data is spanned across all your volumes, the likelihood of any data being wholly sound while the array is completely degraded is unlikely and once that degraded array goes down, it won't be coming back up
  • Greyhole lets you set X file copies per Samba share, so you can set it to two and Greyhole will create two copies of every file you transfer to it
    • This is, of course, less efficient, instead of using 1/x (or 2/x) space for parity, you're now just using x space, or a one to one backup
    • You can potentially lose data with just two failures
      • If you have data on drive 1 and Greyhole creates a backup copy on drive 2 and then both those drives fail, you've lost said data
        • Note: This assumes you have 2x file copies set, if you set 3x file copies it would take 3 failures to lose data -- but then you would need 3x the hard drive space for 1x data
        • See the chart at the end of this section for a visual explanation of how two failures could result in data loss even when you're creating 2x file copies
      • In the above situation all data would be safe in raid 6, but would be gone in raid 5
  • Surviving drives are not impacted by failures in Greyhole
    • If you have five drives in a Greyhole pool and drive three dies, the other four drives are fine and still accessible
*Note: If you diligently keep backups, fault tolerance is not the biggest concern in the world.
Notice that Your_Picture.jpg is on Drive 1 and Drive 2, even though you've got redundancy, if both those drives died, Your_Picture.jpg is forever lost :(
* Note that the above configuration is illustrating a share with 2 file copies and the 'most_available_space' dir selection algorithm 


Interface
  • As of right now Greyhole relies on Samba to capture file system events
    • This means all file operations must occur through Samba or Greyhole will not know about them
    • If you want to manipulate the Greyhole volume locally on your server you must locally mount the volume with cifs
    • Note: There's an open ticket on the Greyhole github about using an alternative mechanism for logging file system events which would decouple Greyhole from Samba, but this has not been done as of 0.9.9
  • Since raid 5/6 is done below the filesystem level, this is not an issue and you can modify locally, through Samba, etc
Performance
  • Greyhole will give you little if any read performance boosts
    • It's possible you can end up reading multiple files from different drives at the same time and get a speed boost this way, but that's about it
  • Raid 5 / 6 stripe data across all your drives at a low level
    • When you read a file there's a high likely hood that the file actually spans across many or all the drives in your raid volume
      • This means if you read a file the system doesn't just have to wait on the current read to go through, it can go on to the next drive and wait on it as well, and the next, etc, giving you a good boost in read speeds
  • Greyhole writes are no slower than writing directly to a drive*
    • *Once the file is written to the landing zone Greyhole will have to copy it again to one of your pool drives, so in reality it takes something like twice the amount of time to write a file, plus any overhead for creating meta data -- though this is largely transparent to the end user!
  • Raid 5 / 6 has a good amount of overhead associated with writing data since it must create parity for the data, which requires complex calculations and read operations across all the disks
So what does all this mean, really? Well, Greyhole is generally a far more flexible framework for pooling. It is limited though, in its dependence on Samba. It also won't give you any significant performance boosts. It does allow you to instantly grow your pool. Raid 5 / 6 is more efficient, but has a far more rigid structure and in extreme failure scenarios results in total data loss. In cases of minor failure each system has its merits depending on your viewpoint.

The bottom line for me, when dealing with a large volume, raid 6 was a hassle. Reshaping or recovering the array took about 40 hours, even over eSata with a reasonably fast dual core processor. With Greyhole things are far more flexible. If my sata controller decides to spontaneously reset a port, I do not have to worry about an array falling over and the ensuing force reassembly. In fact, when the drives have reset, Greyhole hasn't even blinked. For a home user scenario, if you're willing to deal with 2x the space usage*, (more akin to Raid 10) then Greyhole is a clear winner for it's flexibility. If you're willing to put up with raid's rigidity and you cannot abide the space required for one to one redundancy in Greyhole then raid 5 or 6 is by no means a bad choice. Like most things it comes down to your situation and preferences, but I hope this has given you a basis for making an informed decision between the two.

* You do not have to use 2x storage space. If you are confident in your backup strategy you could set 1 file copy and use all your storage space for files and no redundancy. In this case any failure will result in data loss -- which isn't a big deal if your backups are up to date.

Recap


Greyhole

  1. Support for varied volume size
  2. Flexible architecture -- easily and quickly add / remove drives
    1. Any individual drive can be read from other machines
  3. Better worst-case fault tolerance
  4. Only provides for one to one backups (2x the space)
  5. No performance gains
  6. Coupled to Samba
    1. Requires locally mounting through Samba to change storage pool files on the server
Raid
  1. Generally provides a read performance boost
  2. More efficient fault tolerance
    1. Raid 5 requires 1 drive for parity no matter how many total and raid 6 requires 2 drives for parity no matter how many total
    2. In raid 5 any single drive failure can be tolerated and the array will rebuild once you replace the failed volume, raid 6 can handle any two drive failures
  3. Is like any other volume and does not require samba or any other interface for interaction with array files
  4. Rigid requirements
    1. Drives are only readable when all are together in an assembled array
    2. Individual drives cannot be read on other machines
    3. Requires identically sized volumes
  5. Rebuild / reshaping times for large volumes can be slow
  6. If you surpass the fault tolerance (2 or 3 failures depending on your raid level) your data is completely gone.

Friday, July 1, 2011

AnyBackup On The Web

All I can say is wow apparently July 1st rolled around and people have suddenly heard of AnyBackup -- realistically a small group of people, but people none the less. 0.9.1 has gotten more downloads in the last day than all the previous versions combined. I've been making and using the application for about half a year now, during which time the whole I've continued to improve and polish the program. It continues to make my life easy and I certainly hope it is helping others.

All that said, it seems like AnyBackup is suddenly open to a much wider user base, so I would not be surprised if people begin discovering new bugs. If you're using AnyBackup and you run across bugs, please, let me know what they are! You can raise issues at http://code.google.com/p/anybackup and I'll do my best to address them in a timely fashion.

For anyone who is a new user and is confused about how AnyBackup works, please read the wiki. (Be especially careful when selecting a backup drive, it will delete any files on the backup drives which are not on your content drives!)

AnyBackup on the web:

Followers