Hacker News new | past | comments | ask | show | jobs | submit login
A Look at Backblaze’s Toshiba Hard Drives (backblaze.com)
96 points by ehPReth on May 13, 2015 | hide | past | favorite | 50 comments



The first hard drive I used was a 40 MB Seagate, $900 in 1989. It was the size of a hefty phone book.

The first NetApp Filer I used had 4 GB drives, total capacity a few hundred GB, don't recall cost (not cheap), in 1997. It was the size of a small closet.

The first EMC I used had drives of size I don't recall, total capacity in the TB, for unimaginable prices, in 2000. It was the size of small room.

We're up to 8 TB drives. In 3.5". For under $300. It's mind-boggling.


When I consider that my phone has a 64 GB SD card smaller than my little fingernail, I feel the same way.



You win. :-)


>> HGST, Seagate and Western Digital drives all have the serial number of the drive on the top end of the drive. Toshiba drives do not.

Is simply printing and adding a label to each hard drive not enough, is it too error-prone, or what?


Yev from Backblaze here -> The drive manufacturer won't do it for us (at least not at our scale). We're currently chatting to some channel retailers that would be able to do it for us and allow us to scale a little bit. It was discouraging us from buying in larger quantities, but if the price of the drive + labeling works out, we'll try some larger orders.


The drives don't get delivered in pods for you, right?

So why is this not just part of the "take all drives from large box/crate/etc" process for the drives?

IE whoever is taking them out of boxes puts them down, one by one, in a simple little labeling machine that slaps label on them (and records into a stupid database the label).

If you want something more advanced, labeling machine has small camera that takes picture of top of drive for further identification, you can process and store all the barcoded info that exist in the image (unlike OCR, the barcodes should be 100% accurate)


We do not get them delivered in pods no, we put them in at the datacenter. We don't do it because of the time constraints. Lets say it takes 30 seconds to do one individual drive. With 45 drives in a chassis that's 22 minutes per pod, and we get a ton of pods and drives delivered at once. Our datacenter techs are busy enough as it is, so if we can offload them, we try to. Now, if these drives were inexpensive and had a 0% failure rate over 4 years and the manufacturer/retail chain supplier still refused to put them on, yea we'd hire someone to do just labeling and do it ourselves, but for now the math doesn't work out.


Yev, if you're taking a drive out of the anti-static bag and installing it in a machine it takes slightly less than 3 seconds to pull an asset tag bar code off a sheet of printed labels and stick it on the drive. Two scans of the drive asset tag, serial number on the label, and poof you are done.

A very large consumer of drives at a previous employer :-) did this pretty efficiently. When we expanded our cluster for Blekko we did this for the 5000 drives we got from Western digital (well the scanning, we didn't really need an asset tag) and it goes really quickly with a code scanner in hand and a python script recording the values.


Sure, maybe my math was off, I'm not a datacenter guy, they are very efficient ;-) It comes down to cost/time. We did small-scale tests and they went pretty well. We're hoping to avoid the manual process on our end, but if we can't get a distributor to label them and it makes financial sense to buy the Toshiba's, we'll do it ourselves :)


We don't do it because of the time constraints. Lets say it takes 30 seconds to do one individual drive.

But in your article you claim that the unlabeled Toshiba drives lengthen the maintenance time by "a few minutes" every time they fail.

Since all drives eventually fail, wouldn't it make sense to trade those "few minutes" at the end of the lifecycle for a constant 30 seconds at the beginning?


Certainly, at least mathematically. If we ramp up the Toshiba purchases and cannot get them pre-labeled, we'll definitely do it on our end, though we think we'll be able to get them labeled ahead of time from some channel partners. At least we hope, signs look good :)


My experience of buying 20k disks at a time is that even those volumes aren't significant to the manufacturers. So essentially you are asking the distributor to label them.

Given that you have multiple vendors and you also seem to have a risk even in handling any drive, I would suggest that asset tagging the disks and using a bluetooth scanner, or android data entry app, would make sense for all your disk assets. You can then automatically document and track the entire lifecycle of a disk live, as it is inserted or removed.

Your refusal to generate your own labels seems strange.


Hi Yev, just want to say that I love these posts that Backblaze does regarding drive reliability. Thanks.


Cool! Well, that's why we write them, people seem to dig them! And we have a lot of stats and data from our experiences to share, so it's mutually beneficial!


Sounds like a job for a temp.


I was thinking an intern, but yeah... sounds like a great way for somebody low-level to get their foot in the door.


Scrawling the last three digits of the serial number onto the drive end with a permanent marker would probably suffice.


I'd be surprised if they didn't actually have an automated system to scan and record the serial numbers.


It wouldn't be hard to determine whether the barcode/number on the end is unique or not - if it isn't, you'd probably find collisions very easily especially within ones of the same batch.

I did a bit of searching (there's a noticeable shortage of photos of Toshiba HDD ends on the Internet...) and figure that it's probably some sort of batch code - the 3 I could find and read were POU34250025173, POU34250027527, and POU37250019573. The one in Backblaze's photo is POU34350016620.


Why change a process that works without a compelling reason?


Didn't you brag about taking out HDD out of external drives to use in your DC? But now you are saying you can't even put a label on a drive?


Yev from Backblaze here -> sure we can, but the datacenter guys are very analytical, they take time in to consideration as theirs is at a premium, so they could do it in the datacenter, but we'd much rather have them prelabeled.


This could be automated. A very simple computer with a SATA port and a USB label printer should be able to neatly print a tag with relevant drive data for every drive it sees through the SATA port.

At least the human error part would be mitigated. With some clever connector design, I guess a drive could be labeled in 5 seconds or less. If you do it for all drives before mounting, you'd get a single standard label for every unit.


Definitely something to consider if we scale this up. Right now if we were to propose that to our ops team their heads might explode. Gotta make sure they are open to new workflows first :*D


> Right now if we were to propose that to our ops team their heads might explode.

Please don't. I wouldn't like to be responsible for such a disaster. Those are nice folks.


They ARE nice folks. I shall tell them you said so. They like the positive reinforcement :)


Slightly off topic - I love the Backblaze reports. I'm in a situation where I need some 2.5" drives but I'm having a hard time finding reliable research information on reliability. Anyone have any tips?


Do you need high performance, energy efficiency, or a lot of space?

Performance-wise, I have nothing but good things to say about the WD Black series and recent 7200rpm drives from HGST (formerly Hitachi, now owned by WD). Of course neither is any match to a decent SSD, but the 1TB models are 1/10 the price of a similarly sized SSD :)

For energy efficiency, on the other hand, any 5400rpm drive from WD or HGST will do. They are quiet and reliable.

As for Seagate, I've had at least two of their drives fail on me in the last few years, not to mention they feel significantly slower than similar drives from WD and HGST in typical laptop usage. Even a 5400rpm WD Blue can run circles around a 7200rpm Seagate drive.

<evidence type="anecdotal" />


A 1tb HGST 7k drive is £50 A 1tb Samsung 850 EVO is £300

I thought it'd be interesting how your "1/10th the price" sounded ok but was actually more like 16% - 20% the cost. I'm constantly impressed by how the SSD prices keep falling.


Yeah, the prices seem to have gone down again since the last time I checked :)


I find the problem with lack of serial number on the side of the hard drive to be a little silly. How hard would it be to get someone in your firm to simply place labels with serial numbers on the side of the Toshiba drives. It wouldn't take more then 10-20 minutes with a label maker.


> It wouldn't take more then 10-20 minutes with a label maker.

That does introduce a human error based point of failure though - a few from a batch could have their labels mixed up.

Given how blackblaze's setup is described someone powering off the wrong drive or entire node by mistake due to a misidentified non-failed drive will not affect service at all, the built-in resilience to hardware failure will easily cover this, but there will still be some impact if only in the wasted man-time (for the original error and any resulting investigation & relabelling effort) and any light performance degradation as the affected node is brought back into service.

So for the number of drives they use, perhaps the manufacturers labelling the drives in a consistent manner is valuable enough to complain about not being the case.


Looks like Backblaze finally fixed SSL, took them a little while of whining and stupid excuses(1), but we are finally there.

1: https://news.ycombinator.com/item?id=8999036


USA! USA!


>> Failure: Disk 0491/sdag doesn’t contain a valid partition table >> Pod0491: >> x Replace sdag (Z252A34AS) with a new 3TB Toshiba DT01ACA300 >> x Reboot Pod0491 and re-add new sdag to sync

I wish they fed that to a sound synthesizer with a 'Borg' voice. Coolest datacenter ever.


In Mac OS X Terminal:

say -v Trinoids "Failure: Disk 0491/sdag doesn’t contain a valid partition table. Pod0491: Replace sdag (Z252A34AS) with a new 3TB Toshiba DT01ACA300. Reboot Pod0491 and re-add new sdag to sync. Your biological and technological distinctiveness will be added to our own"


Not sure if I want to tell the kids about that one or not. I may not hear the end of it. (or at least, the half hour each till parental controls kicks them out)


It's even more fun if you teach them "sleep" too, so it will start speaking at unexpected times.


I love that it knows how to pronounce 3TB.


That is cool, although it doesn't know the difference between TB and Tb.


Thank you. That made my day. My son and I are going to have some fun with this one.


You're wonderful.


I think I prefer Zarvox ($ say -v Zarvox ...).


It's also a good choice. Probably sounds more like an individual Borg, whereas Trinoids has that "multi-voice" thing that the Borg use.


Ha! This is way cooler than Google translate!


In addition to what Clunkclunk said, you can also do the following -> http://codewelt.com/proj/speak?lang=en-us&text=Failure:%20Di...


I still wish they would clarify the language in their ads. When they say 'everything attached to your computer, all your external and internal drives completely unlimited for $5/month" they really should clarify they have no way to back up a NAS device.


Well, technically a NAS isn't connected to your computer. It's connected to your network, which your computer is connected to.


I just bought one of the 3TB models described here to replace a failing 1.5TB Seagate. It's nice to see my choice backed up by data; none of the 3TB drives in my price range were particularly well-reviewed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: