@thehigherlife - I was thinking about the same thing recently (an announcement about FB becoming the biggest photo app on the planet). I found two great sources of information on this..
Very deep analysis of Flickr's arch from a couple of years ago - including a link to a presentation to Cal Henderson's slides. I'm surprised he got canned from Y! yesterday though!
i don't know anything about cal's departure, but i can say that any large co like yahoo, google, msft, apple etc is more or less impervious against the loss of one or a few technical individuals. his departure may have symbolic value but the machine that is chugging there will keep chugging, for better or for worse
The info from a public-facing, private company is striking. I'm amazed how much code and knowledge Facebook has already given back to the community given their state. Though perhaps being private and not publicly traded makes it easier. Still, it seems like they have done some things very well and wouldn't want to expose them.
Amazing? Really? There are an awful lot of companies dramatically smaller than Facebook that have released a lot more code. 37signals, with maybe a 10th the valuation (if we're being very generous to 37signals), has given away a lot more code than Facebook.
Facebook may be many things, but "amazing" generosity is not one of their characteristics. Call them "somewhat" generous, if you like. Heck, I'll even let "pretty" generous slide. But let's not be over generous in our assessment of just how nice Facebook is to the community. They're nice guys, and all, but they aren't saints.
I wonder if they have some mechanism for rebuilding the raid array periodically to handle the inevitable undetected read errors. With raid6 you can actually do this, by comparing both parity blocks with the actual data.
Hard disks typically have undetected read error rates of 1 bit in 1E15, so assuming they transfer 1PB per day that's about 9 errors per day. Which isn't much I guess, but I wonder if they do anything about it.
(Not speaking officially, obviously. I had nothing to do with Haystack anyway).
They're images. So you flip a bit every once in a great while; as long as the bit isn't part of image metadata, no human user will notice. The next time they load the image, (at least, assuming it's been long enough for it to get evicted from the CDN) they see the right data.
I think the haystack folks made the right decision, here. A rule of thumb is that it takes 3-5 years for a novel on-disk format to gel. There are always random bugs that rely on particular sequences of allocations/deallocations, races, etc. Using an off-the-shelf filesystem that allocates the blocks and then stays out of the application's way, like XFS, probably saved these guys at least one year of screwing around.
Even if Facebook end up not making any money, it contribution to the next generation of software engineering will be important. Solving major scaling issues, make Facebook a full scale lab for tomorrow. They will set standards for how to approach some unique problems.