15TB HDDs: Western Digital Unveils the Ultrastar DC HC620

_jcwu · on Oct 30, 2018

It's an SMR drive. So better for archiving than regular IO

orf · on Oct 30, 2018

https://en.m.wikipedia.org/wiki/Shingled_magnetic_recording for anyone else wondering

astrodust · on Oct 30, 2018

Not to be confused with ASMR: https://en.wikipedia.org/wiki/Autonomous_sensory_meridian_re...

kdkeyser · on Oct 30, 2018

If you are the target audience for these drives, you likely don't even consider regular (i.e. including random) IO as a use-case for modern HDD's . 10+ TB HDD's really only make sense for sequential IO, even if they technically still support random writes (e.g. the 10 & 12 TB PMR drives): the order of magnitudes of difference in random vs sequential IO performance make this a no-brainer.

If you look at the design of, for example, DropBox Magic Pocket, or Infinidat & Qumulo, you'll notice that their HDD access is really as sequential as possible. And if your storage layer is thus already optimized towards sequential writes, why not take the opportunity and get some capacity "for free" by adopting SMR drives?

marmaduke · on Oct 31, 2018

I have a bunch of services running off a 8 12 TB IronWolf Pro array in RAID10, and it does pretty well, even for DB stuff.

Hei1Fuya · on Oct 30, 2018

Copy on write filesystems can probably be optimized for SMR by using TRIM commands to punch holes and rewrite the content sequentially in a new zone. Afaik both zfs and btrfs have plans to do this.

That way they can be useful for more than archival.

zzzcpan · on Oct 30, 2018

You probably mean log structured, not copy on write. CoW doesn't help make writes sequential, unlike log structured filesystems.

aidenn0 · on Oct 30, 2018

log-structured is a special-case of CoW; specifically it's CoW where the allocation strategy is sequential blocks.

mjevans · on Oct 30, 2018

I was thinking that F2FS might be a good filesystem as a base on which you'd use an object storage abstraction layer (like ceph)...

However since I initially saw the news of these drives a few days ago Samsung also axed some Linux devs, which gives me pause and makes me reconsider the long term viability of this filesystem...

https://en.wikipedia.org/wiki/F2FS

kdkeyser · on Oct 30, 2018

A full blown filesystem is overkill for an object store. You could use something like libzbc ( https://github.com/hgst/libzbc ) to write directly to the SMR drives on the block level.

I believe Ceph now has abstracted the drives away through BlueStore, which simply puts a large RocksDB database on the drive, bypassing most of the functionality a filesystem offers. It should be much easier to make an SMR compatible version of the LSM-tree backend of RocksDB, than writing a full-blown file system.

antongribok · on Oct 31, 2018

It's not accurate to say that BlueStore is just a large RocksDB...

RocksDB is one of several possible backends for object maps. There is a lot more to BlueStore than just omaps.

Also, BlueStore was actually designed with SMR drives in mind, however certain components of it are best placed on solid state media.

londons_explore · on Oct 30, 2018

I assume that the drive firmware remaps all your writes to make them sequential anyway for increased write performance.

Hei1Fuya · on Oct 30, 2018

For drive-managed SMR drives, yes, but these seem to be host-managed ones. So the filesystem has to be aware of the zones.

HankB99 · on Oct 30, 2018

> ... the filesystem has to be aware of the zones.

Does that mean the drives simply will not work with an unaware filesystem or that it will work but performance will be poor?

gamegoblin · on Oct 30, 2018

They will not work at all. You have to issue special commands to the drive to be able to overwrite zones.

e12e · on Oct 31, 2018

Still seems random access might be an issue. But would love to see how eg nilfs2 (maybe on top of software raid) benchmarks against zfs on these big drives.

tracker1 · on Oct 30, 2018

Depends... my NAS is mostly used for video/audio archives and generally only one device in the house using it, playing back a single file, or listing a directory. Would probably be fine in that use case. Beyond that, it's used for backups.

Seems like a decent use case. I'm not sure how well they'd work in a NAS with Raid-5/6 though. I'd been considering a new nas with 4-6 drives at 8-12TB already. Random I/O isn't my primary use, and I'm sure there are others at these sizes.

post_break · on Oct 30, 2018

Will we hit a point where the size of the drive is simply too big to get the data off in any decent amount of time?

stephengillie · on Oct 30, 2018

This is one reason why RAID 0+1 is a best practice, and RAID 5 & 6 are no longer recommended. It takes too long to rebuild the array, leading to a multi-failed disk situation.

nine_k · on Oct 30, 2018

RAID6 should be fine rebuilding online (in RAID5 mode) even under a moderate write load.

Of course one should source RAID disks form 3 different vendors, to ensure that they are from different batches, and are not going to fail at approximately the same time.

stephengillie · on Oct 30, 2018

Do other manufacturers produce this size of drive? It's difficult to source from 3 vendors if there's only one making the product.

thfuran · on Oct 30, 2018

Get one from amazon, one from newegg, one from the manufacturer directly or some such.

lostapathy · on Oct 31, 2018

I try to buy hot spare or the last drive in a raid6 later than the rest of the array to try to spread them out too.

tracker1 · on Oct 30, 2018

Good advice, though I once had about half a dozen drives (12 drive RAID Z2 with 2 as hot spares) fail within a few weeks of each other in separate batches from sourcing. (Seagate 3TB drives, I think there's been articles on how bad that series was).

tfigment · on Oct 31, 2018

I don't know how i survived those Seagates. Lasted maybe a year and started dropping like flies. Synology seems to recommend identical drives as i recall but work fine with different sizes and makers afaict.

jmpman · on Oct 30, 2018

CRUSH algorithms are used to overcome rebuild limits in modern arrays. https://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf

kdkeyser · on Oct 30, 2018

CRUSH is an example (and not the first) of a "distributed rebuild" approach: you have an array of N drives (with N large, e.g. 100), and if 1 drive fails, you read in parallel from all (N-1) remaining drives, while distributing the reconstructed data across the remaining available capacity of all (N-1) remaining drives.

In effect, you get the total bandwidth of (N-1) HDD's working in parallel. And the bandwidth of 100 HDD's doing sequential IO in parallel is really massive ( ~ 10 GB/s).

Examples of companies claiming to use this approach are Qumulo (rebuild in couple of hours), Infinidat (couple of 10's of minutes), ClusterStor GridRAID (now part of Seagate I think), or "Declustered RAID" in GPFS (IBM)

pinewurst · on Oct 30, 2018

GridRAID is owned by Cray now, who were the primary OEM from Seagate.

Thanks for pointing out that declustered/distributed rebuild RAID has many historical precedents (also 3PAR BTW) pre-CRUSH/Ceph.

_jcwu · on Oct 30, 2018

Raid 01 has its own risk. If the wrong two disks fail your entire array is toast

throwaway2048 · on Oct 30, 2018

as opposed to raid 5 where if any two disks fail your array is toast, raid 6 increases this to 3.

However both raid 5 and 6 have 2 huge problems:

Data inflight at write time (power/hardware failures are more likely to corrupt the array, especially silently, which is the worst outcome).

Parity calculations require you to spin up the whole raid5/6 array during a rebuild, massively increasing the chance of a multi drive failure and a lost array. If one close-to-EOL drive dies, putting its sister drives through what is essentially an all day full tilt stress test is a terrible, terrible idea, and this idea keeps getting worse (takes longer) as drive sizes grow.

raid 0+1 sidesteps these issues mostly at a modest increase in drive count, its a no brainier for most setups.

StillBored · on Oct 30, 2018

Data inflight at write time (power/hardware failures are more likely to corrupt the array, especially silently, which is the worst outcome).

How is that? RAID doesn't affect data persistence behavior in any meaningful way. FUA/SyncCache/etc are supported by RAID controllers same as the underlying disks in writeback enviroments, parity updates included. Put another way, if you FUA or flush the writeback cache, those operations won't complete in a properly implemented RAID environment until the data is persisted somewhere, even if that means passing FUA down to the underlying storage. Granted there are a number of ways to mess this up, RMW cycles in a controller that doesn't have some kind of persistent memory and flush on power restore. Anyway, none of this is any worse than what happens in any other WB cached storage technology.

Finally, all this fearmongering about loss on rebuild is also something that should be more fully explored in the context of the fact that decent RAID systems run background scrub operations on a regular basis. Those operations by themselves are going to "stress test" the array on a regular basis when its consistent and not degraded. I've actually got a fair amount of experience in this area, and I'm here to tell you that if you think this is a risk consider what happens to non-raided unscrubbed drives that have a lot of data silently bitrotting on the platters. That latter effect is nearly always the problem in RAID environments when someone starts a rebuild on drives/sectors that have been unread for extended periods of time. But, in the case of RAID, a properly implemented system won't fail a drive for a single read failure during a rebuild, instead reconstructing from the other drives and leaving the drive online long enough to complete the rebuild and then taking it offline.

Basically raid 1 setups don't actually fix any of these problems, except through the use of massive additional parity disks overhead. Overhead that can also be applied to other RAID algorithsm to much better effect. AKA a mirrored RAID 6 provides far more protection than a mirrored raid 0. Similar levels can be had with 6+6 in environments where that is possible, with trivial capacity overhead.

throwaway2048 · on Oct 30, 2018

Raid 5/6 require parity calculations before data can be written to disk. This is a significant amount of data, especially at high writing speeds. That is what causes the inflight data problem.

Battery and flash backup on controllers dosen't fix the problem of hardware failure (which is significant, especially on big hot controllers.

StillBored · on Oct 30, 2018

Again, decent controllers have ECC protection and the like, and frequently are available in HA configurations if your worry is controller failure (along with redundant/dual data paths to the media via SAS/NVMe/etc). Plus, there are a long list of technologies that can be enabled at the HBA layer and pushed all the way to the media (T10 DIF/DIX comes to mind).

But much of this micro level redundancy is overkill as frequently one uses some kind of application level HA/redundancy as well. So, loss of a RAID5/6 disk in a single machine is the functional equivalent of loss of a any combination of RAID 0/1 in the same machine. You still need the higher level redundancy as well as a backup plan.

We could start breaking the discussion up into fabric attached vs direct attach RAID vs Software, but I think its sufficient to say, that RAID5/6 doesn't _increase_ the failure surface in any meaningful way when your not using fly-by-night RAID.

Edit: Maybe what your trying to say is that cache flush/FUA operations for a give piece of data don't cover the parity calculation and buffers? That is false, a controller should not be responding to FUA/etc until the entire (including the parity) block has been persisted. So if the controller dies during the operation the host OS is fully aware that the operation didn't complete. The given block is of course left in some unknown state in this case, but that is true of any write operation that fails like this, regardless of WT/WB/RAID/etc.

cyphar · on Oct 30, 2018

The biggest problem with raid5 is that it is completely unprotected against silent corruption -- because there is no way for raid to know which data is the corrupted one (and as a result it has to decide whether the parity is correct or not -- though on most raid implementations just ignore silent corruption completely and so the parity is always assumed to be wrong in such cases).

So even if you rebuild an array, a bad drive might've blown away all of your data already. If you were to compare this with ZFS' "raid" Z1 (same parity, different design) you get detection and protection against silent data corruption.

imtringued · on Oct 31, 2018

>through what is essentially an all day full tilt stress test is a terrible, terrible idea

The rebuild isn't putting the disks under stress. The sister drive has already failed silently but you only notice this once you start the rebuild. The solution is to check the disks once a week by fully reading every sector.

yellowapple · on Oct 30, 2018

The normal answer here is to make sure that each side of the RAID10 (RAID01 is something different and much less common) mirror uses drives from a different vendor, thus giving each side a different bathtub curve / failure rate and mitigating the impact of a bad batch. This is a nice advantage over parity-based setups like RAID6 (since replicating this with RAID6 would require finding a unique vendor for each array member, and there are only so many vendors).

For archival purposes, though, you're probably better off with a normal RAID1 + some kind of JBOD setup (like with LVM); striping makes data recovery more difficult should you indeed lose all RAID1 sides of a given member.

nine_k · on Oct 30, 2018

You can upgrade a 2-disk RAID1 to a 3-disk RAID5, then chain them to RAID0 as normal. It gives you a better chance to keep data intact, hopefully without lowering the write speed seriously.

https://en.wikipedia.org/wiki/Nested_RAID_levels#RAID_50_(RA...

zaarn · on Oct 31, 2018

RAID 50 doesn't really solve it, it exposes you to some more risk since you can still die with 2 disks but now you have more disks in each sub array.

The correct answer is either a 3-mirror RAID1 or RAID6.

Bcachefs also promises some solution to this by allowing both erasure encoding and replication to co-exist, according to it's documentation.

stephengillie · on Oct 30, 2018

Multiple "0" drives can be added for further redundancy.

astrodust · on Oct 30, 2018

"Zero" drives are the ones that when you lose them you have zero data.

"One" drives are the ones with a copy.

stephengillie · on Oct 30, 2018

I haven't worked with arrays for years. Sorry for the mistakes.

ahoka · on Oct 30, 2018

Interesting. What do you think the advantage of raid01 instead of raid10? The latter looks safer at first sight.

stephengillie · on Oct 30, 2018

I get RAID 01 and 10 mixed up all the time. These names are too similar. Please understand that I meant the better of the 2.

Hei1Fuya · on Oct 30, 2018

ZoL 0.8 will have sequential resilver which should be able to restore a disk in a few hours.

ahoka · on Oct 30, 2018

If I did my calculation right it takes more than 16 hours to retrieve all 15T from the driver. Wow!

a2tech · on Oct 30, 2018

I have a lot of places already where recovering from a full wipe is almost not worth it. Customers with 100's of TB of data in 'prosumer' NAS devices that are chock full of regular drives.

androidgirl · on Oct 30, 2018

Tape is already like that.

a012 · on Oct 30, 2018

AFAIK new generation of tape has decent read-write speeds.

astrodust · on Oct 30, 2018

Decent for tape, or decent in general?

gamegoblin · on Oct 30, 2018

Tape sequential throughput is higher than HDD. It's the latency/seek time that gets you. But for a recovery scenario, you can hopefully just do a giant sequential scan.

astrodust · on Nov 1, 2018

Obviously. I mean more like what's the recovery time from tape vs. cheap HDD array?

quasarj · on Oct 31, 2018

So does anyone know of a way to take a set of files and write them to an HDD, in NTFS or exFAT format, in a single sequential write? Essentially building the FS on-the-fly (because we're talking about datsets that are much too large to fit into memory)?

Dylan16807 · on Oct 31, 2018

So basically a zip file?

I will note that you could easily build the MFT before you start transferring data. That's really your active 'dataset' here, and it's not very big.

quasarj · on Oct 31, 2018

Building the MFT first is pretty much what I want to do. But I'm not aware of any utilities that can handle it, nor where to start with writing it myself...

I have a project where I routinely need to copy large amounts of data (3 to 8 TB) to a hard drive. Problem is, my files are all 512kb. So this is much slower than it could be...

If I write it as a single tar file I get excellent throughput, but the users who need to be able to work with the drive are unable to handle a tar file. They need to be able to plug the drive into a Windows computer and have it "just work".. which presents some problems.

AgentME · on Oct 31, 2018

Maybe you could use the same type of filesystem that file CD/DVDs use (UDF?). They're written sequentially and are commonly supported.

proverbialbunny · on Oct 31, 2018

Using dd.

http://man7.org/linux/man-pages/man1/dd.1.html

Dylan16807 · on Oct 31, 2018

That's how you copy an existing filesytem. It's not how you take an arbitrary subset of files and put them into a new filesystem with minimal write amplification.

rini17 · on Oct 31, 2018

ROMFS (there are several variants in linux kernel) are for this use case,but these are only read-only.

bitL · on Oct 30, 2018

I was about to buy 3x 12TB Toshiba for Deep Learning datasets; now I need to reconsider... Does anyone know what are the current reliability stats for >10TB drives? My old 6x 4TB HGST in NAS are running without a single problem for the past 3 years...

DavidVoid · on Oct 30, 2018

The best resource for that would probably be Backblaze's quarterly hard drive stats.

Here are the ones for Q3 2018: https://www.backblaze.com/blog/2018-hard-drive-failure-rates...

Scroll down a bit and you'll see the annualized failure rates (AFR). The 10TB and 12TB ones seem to be pretty excellent.

mnw21cam · on Oct 30, 2018

The counter-point to this is that since Backblaze uses consumer drives, they probably won't ever test these new drives, because they are enterprise.

DavidVoid · on Oct 30, 2018

Would the 10TB Seagate ST10000NM0086 (of which they have 1,220 drives) not count as enterprise ones? Or the 12TB HGST HUH721212ALN604 for that matter.

wazoox · on Oct 30, 2018

Excellent. All recent big disks over 8 TB from all vendors have excellent reliability. HGST/WD Helium drives are particularly good; I configured and installed many hundreds from 6 to 12 TB in the past 4 years, and not a single one failed yet.

worldexplorer · on Oct 30, 2018

How do people use such large HDDs when internet download speeds are still low as compared to downloading directly on cloud services (like aws)?

detaro · on Oct 30, 2018

What do you consider low? On a 100Mbit/s connection you can fill a TB in a day or two, and those are somewhat common.

And of course some people actually create the data themselves and don't download it.

bitL · on Oct 30, 2018

If you are doing serious Deep/Machine Learning you often get massive datasets, you might need to pre-process substantial portions of them for different models you want to try, you might want to do custom models that are trained e.g. on 4k/8k video footage (think DLSS, self-driving car or drone footage); space gets exhausted pretty quickly. You might also want multiple independent drives, as you might do this all in parallel and it would be too slow to do it on a single drive at the same time, and even RAID with two drives would have a large performance penalty for seeks.

peterburkimsher · on Oct 30, 2018

Internet speeds are slow in some places, such as France (specifically the Pays de Gex near Geneva, where my parents live). My dad uses iCloud, but he drives to CERN to upload (he just retired).

I have 18 TB: 5 TB Seagate (x2), 4 TB WesternDigital, 2 TB WesternDigital, and 2 TB internal.

Backups take the most space - I fix laptops for friends from church, and they don't back up but still want their files to be safe. I had to shuffle some files around to free up 650 GB for a recent repair, mostly photos & videos.

Virtual machines use a lot of space too. I made VMWare Fusion images of every Mac OS version 10.5-10.13, Windows 95, 98, 2000, XP, 7, and 10, in several languages ( https://peterburk.github.com/i2018n ), and some Linux distros.

Another 1 TB is a dataset of Chinese characters from a machine learning project of mine ( https://blog.usejournal.com/making-of-a-chinese-characters-d... ).

Music, mostly from repaired iPods back in high school, accounts for a lot as well. There's some movies too, though I missed a chance to get 2 TB from a friend because I didn't have enough space at the time. If I upload those, even those that I legally ripped from CDs & DVDs, I'm worried that it'll trigger content filters.

For these, local disks are more useful than cloud services in my opinion.

fencepost · on Oct 30, 2018

I'm going to highly recommend looking at drives with WizTree, which does a very fast display of what's taking up space based on parsing the MFT rather than scanning the entire drive.

You may find that there's a massive amount of data where you wouldn't expect it, such as in the Windows Temp directory - if so and it's a bunch of files named "cab_something", you can kill all of those and prevent recurrence with a little housekeeping.

Details: https://www.computerworld.com/article/3112358/microsoft-wind... (update log files in windows\logs\cbs get auto-compressed, but compression breaks and leaves big temp files if the file to compress >2GB)

aidenn0 · on Oct 30, 2018

My dad and I created backups locally, mailed each other hard-drives, and we just do a weekly rsync. Storage is large, but network traffic is relatively low.

astrodust · on Oct 30, 2018

Having a "backup buddy" you can swap drives with once in a while is never a bad idea. Encrypted backup drives can save your bacon if you're ever caught in a bad situation.

yellowapple · on Oct 30, 2018

Over a scale of a few years, things add up.

rootw0rm · on Oct 30, 2018

before my current crop of 8TB WD reds, I ran Toshiba enterprise drives and they were extremely reliable for me. none of them failed in a 24x7 hardware raid6 environment after a few years. only replaced them to upgrade capacity.

bitL · on Oct 30, 2018

I've read that Toshiba is based on old HGST process and those drives were extremely reliable, which is why I wanted to get them for the new Deep Learning workstation.

linker3000 · on Oct 30, 2018

Correct (ex HGST employee). As part of one of the acquisitions/mergers, HGST divested a 3.5" production line to Toshiba.

https://www.tomshardware.com/news/wd-toshiba-hdd-hard-drive,...

femto · on Oct 31, 2018

The 15TB drive packs in 1108 Gbit/inch2. That is, each bit is a square of side 8.5nm. This is small but the transistors in flash are smaller [1]. As mind blowing as the numbers (for both technologies) are in the referenced article, that article is now 2.5 years old. Is anyone aware of more recent numbers?

[1] https://www.computerworld.com/article/3030642/data-storage/f...

Lramseyer · on Oct 31, 2018

From my experience in the HDD industry I remember that a magnetic bit of data is about 13-15nm long and about 40-60nm wide (narrower tracks for SMR.) The length of a bit is constrained by the grain size of the magnetic media. However, the width of a bit (prior to SMR) is actually constrained by the size of the write head. I don't remember why, but I think it has to do with the fact that the write current is like 40 mA, and the magnetic flux density on the write element is like 1.5T (no that's not a typo)

I'm not an expert on transistor pitches, but here's a chart from Wikipedia for the 10nm - https://en.wikipedia.org/wiki/10_nanometer It's kind of impressive for HDDs considering that it's a 2 inch long mechanical arm that is able to move with that level of precision.

Latteland · on Oct 30, 2018

I am going for the 15tb instead of that 14 so I have extra space for backups. Says no body. We are clearly close to the end of spinning rust, absent some new breakthrough.

tallanvor · on Oct 30, 2018

HAMR and then HDMR are expected to allow data densities to increase by 5 to 10 times what is currently achievable. HAMR will probably start showing up in a year or two.

Spinning drives are definitely not going away anytime soon unless there is a much more significant drop in the cost of SSDs.

londons_explore · on Oct 30, 2018

Investment in new spinning drive technologies is going away though. Nobody wants to spend R&D money on coming up with patents and ideas which will be worthless in 5 years when SSD's overtake.

Science investment requires a new technology to have a prospect of a return for most of the ~20 year patent lifespan for it to look like a good investment, and spinning bits of metal aren't that right now.

blihp · on Oct 30, 2018

In the same way that hard drives didn't kill off tape, SSD won't kill off hard drives. The price differential is too great for many applications and they have different operational strengths and weaknesses.

johngalt · on Oct 30, 2018

Tapes have a use case that hard drives do not. Tapes are the lowest cost/GB stored and are more shelf stable than hard drives.

SSDs are higher performance than HDDs and have none of the packaging constraints. Flash storage is going to be put into everything and the economies of scale look quite good.

Storage is scaling but the r/w speeds of hdds aren't keeping up. Following the trend line and we see huge hdds that are functionally useless due to how long it takes to do disk operations.

HDDs only exist above tapes because of their performance. And only exist below SSDs due to cost. Tapes are the floor and SSDs are the quickly lowering ceiling. HDDs are likely to be crushed between.

StillBored · on Oct 30, 2018

You might be right, but keep in mind that flash has been scaling due to the shrinking semiconductor feature sizes (and additional layers/etc). So a large part of flash's core R&D & production costs are being spread over all the logic being produced. That has been hitting a wall, so while the capacity/price curves for flash look nice, they likely won't continue, which leaves open the possibility that if rust actually gets a 4-5x boost in the near future the current market trends will continue. SSDs for perf/power/size and mechanical harddrives for bulk nearline storage, leaving tape where its been for the past 30 years, as an archival technology.

wtallis · on Oct 30, 2018

Horizontal feature sizes for flash memory stopped shrinking years ago. The continued improvements in density and production cost have been the result of R&D that is very specifically focused on 3D NAND flash memory and has little in common with R&D for logic circuit fabrication.

That said, on the horizon of multiple years, I agree that the future scalability of NAND flash doesn't look quite as promising as HAMR/MAMR for hard drives. How that translates into actual product demand and adoption will probably depend on the relatively unexplored question of how much performance per TB our applications actually need. 40+ TB hard drives might not be fast enough to actually serve as nearline storage for that volume of data without eg. multi-actuator technology that essentially gives you more than one hard drive sharing a common spindle motor. Meanwhile, there's no question that QLC NAND flash definitely has adequate read latency and throughput.

londons_explore · on Oct 31, 2018

Multi-actuator tech sounds interesting, and I wouldn't be surprised to see drives with 5, 10, 50 or 100 read heads per platter at some point.

With 100 read heads per platter, typical seek time is cut by a factor of 100. That won't let them overtake SSD's, but at least allow them to close the gap.

wtallis · on Oct 31, 2018

So far, nobody has announced plans to manufacture hard drives with two read heads per platter, so speculating about 100 heads per platter seems rather unrealistic. The multi-actuator technology that is actually being developed by Seagate still has only one read head per platter, but out of the eight or so platters in a drive, the read heads for four of them will be controlled by one actuator and the read heads for the other four platters will be controlled by the second actuator.

Going all the way to 100 read heads per platter would be insanely expensive and would massively increase drive failure rates, while still leaving them about four times slower for random reads than the slowest $35 SSD on the market. This will never turn into a viable product.

blihp · on Oct 30, 2018

You state that with an unwarranted degree of certainty. You're making the same argument, and mistake, that proponents of 'X is going to kill hard drives' have made for decades.

There have been many 'this will be the death of hard drives' technologies over the decades: zip drives, optical drives, tape drives (there was a time when they were predicted to be everywhere... never happened), CD (then DVD) writers, etc. Not to mention MRAM which has been the hottest tech that hasn't really happened yet for 3 decades. These were all going to be some combination of more durable and/or cheaper per Xb. But they all lacked the one critical advantage that hard drives had: massive economies of scale. Here's my prediction: spinning rust isn't going anywhere anytime soon.

p1necone · on Oct 31, 2018

I don't know, SSDs have already replaced HDDs in consumer systems to a far greater extent than any of the things you've listed.

soundwave106 · on Oct 31, 2018

Aren't most cloud computing currently almost completely HDDs at this point? At least, this is the impression that I get from looking at the case of Backblaze (which is probably the cloud storage center with the most public statistics on what they use). So far, the usage of SSDs is fairly minimal on the storage side, although I believe they are used at Backblaze some for operations such as bootup and caching. (https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/)

I would agree on consumer systems (that often have one single boot drive), SSDs make a lot of sense. These days you tend to only see HDDs at the very low end.

From a personal perspective, most of my PCs are completely SSD. But I also have a media server (the largest storage space being reserved for MKV copies of my personal DVDs and Blu Rays). This is composed of RAID arrays of 6TB HDDs. Right now, cost wise, the highest "common" SSD is 4TB SSDs are roughly in the $800-$1200 range. 4TB HDDs by comparison are quite cheap, as low as $89 for a certain Seagate model (I used the Western Digital Reds which for 4TB at the moment are a little higher, $115... for the 6TB model it is $178).

While I definitely wouldn't say "HDDs will always be around", the price difference is very high right now to justify the superior properties of SSD. Backblaze seems to be in agreement here (https://www.backblaze.com/blog/ssd-vs-hdd-future-of-storage/). I guess the question is how long it will take for large capacity SSDs to scale down in price for them to be competitive. Until that happens, I imagine HDDs have a decent future left, if not on consumer devices then at least as drives for cloud / data center storage.

magicalhippo · on Nov 1, 2018

Blackblaze is primarily a backup storage company. Their main bread and butter is relatively cold data storage.

Lots of VPS and similar services offer SSD storage now, and I expect it to grow. The one I use didn't even a non-SSD option, and it was a pretty cheap provider.

SSDs offer orders of magnitude better random IO, I imagine that allows the hosting company to have more clients per storage unit, lowering the effective cost of SSDs.

gamegoblin · on Oct 30, 2018

This isn't true, though.

Seagate and WD are spending mountains of cash to develop technologies like HAMR and MAMR which they expect to take them up to 40TB drives. These technologies require entirely new fabrication processes, etc. Very capital intensive.

ksec · on Oct 31, 2018

Not in the next 10 years. Further than that I don't know. SSD won't be cheaper than HDD per GB in 5 years time, the cost of building Flash and scaling down Flash, Multi Layer Flash, 2.5D / 3D/ 4D Flash ( What ever the manufacture wants to call them ) are also going up. Return of investment is taking longer ( Even with the previous high NAND price ). Moore's Law is dead, and it is not only just CPU.

Theoretically speaking there should be a point where the TCO of NAND Flash would cross HDD. But as NAND scales down, it also reduces its write cycle.

I was never a believer in HAMR, the technologies parent mentioned which Seagate announced in 2012, the idea just seems too unrealistic. The approach WD taking MAMR is much better using Magnetic Field. BPM is also far off, but BPM has been in research for nearly a decade, and MAMR should be here in 2019. All of these R&D will come to fruition in the next few years, where we expect HDD to scale to 100TB in the next 10 years.

dktoao · on Oct 30, 2018

Multi-actuator drives are also coming in the next few years:

https://blog.seagate.com/craftsman-ship/multi-actuator-techn...

That will close the performance gap a bit too I suspect. I have to wonder about power consumption though.

wtallis · on Oct 30, 2018

Multi-actuator hard drives will not close any performance gap. They will just help slow the decline in IOPS/TB that higher capacity drives bring.

They accomplish this by essentially being multiple hard drives sharing a common spindle and helium-filled enclosure. As Seagate is currently implementing the idea, you still have only one head per platter, and at most one independently moving head per platter (but currently the stack of platters is just divided into two groups). Thus, sequential performance does not improve at all (and actually is reduced by the number of independent actuators), and random I/O increases by a small integer factor when the gap between hard drives and the slowest SSDs is already more than two orders of magnitude. However, power consumption shouldn't be much higher for this kind of multi-actuator hard drive over existing hard drive designs.

bradknowles · on Oct 31, 2018

Back when I was an intern at Imprimis (before they got bought by Seagate), I worked with the Manufacturing Engineers for the Wren series of 5.25” drives, including the Wren VII, which was the first consumer SCSI drive with a capacity of 1GB or more.

I looked at the drive actuators at the time, and I was incredulous when the guys told me that all operations were serial. I asked why they didn’t do parallel reads and writes, and I was told that technology was already common for mainframes but too expensive for consumer gear.

So, fast forward from 1989 to now, and I’m sure that idea will come back — sooner or later.

gsich · on Oct 30, 2018

>We are clearly close to the end of spinning rust

In every aspect, except price.

Samsung 1 TB SSD for 150€, Seagate 8 TB for 220€.

cm2187 · on Oct 30, 2018

Agree. And for a NAS the performance of SSD are unlikely to be required.

In fact at one point I made the mistake of enabling SSD caching on a NAS. The SSD became the bottleneck because of the limitation of SATA, ie one SSD on SATA is slower than 8 or 10 HD in RAID5. So unless you really need very high iops, HD are likely to be good enough.

wtallis · on Oct 30, 2018

I'm curious who sold you a NAS with SSD caching that didn't support bypassing the cache for sequential I/O. That's a pretty basic and obvious feature, and it seems like the manufacturer must not have been taking their SSD caching feature seriously if they didn't implement bypassing.

cm2187 · on Oct 30, 2018

Synology.

newman314 · on Oct 30, 2018

Synology has NVMe SSD caching in recent models but it's an either or situation with 10G. I'm holding out for a Synology model with both.

apiudit · on Oct 30, 2018

1tb Samsung 860 Evo is more than 850€ in my country...

wtallis · on Oct 31, 2018

No, it isn't. That's about 6x the price in the US and at least 5x the price in Australia. No combination of VAT and shipping fees can account for that big of a discrepancy. You're probably just looking at a retailer that has inflated the price while they're out of stock. Try getting a quote from someone who has stock in your country ready to ship.

apiudit · on Nov 2, 2018

You are right, 906€ for the 4TB, 1TB is 188€ (amazon prime prices)

noselasd · on Oct 30, 2018

If I can choose between stuffing 6x15TB or 6x14TB at comparable cost into my NAS, I'd buy the 6x15TB ones.

WrtCdEvrydy · on Oct 30, 2018

Be aware this is an SMR drive, so your writes are going to be heavily limited.

blattimwind · on Oct 30, 2018

SMR is fine for sequential loads, and anything else is not what these are for.

simias · on Oct 30, 2018

For archival spinning rust is still massively more cost effective than SSDs while offering better performance than tape, especially for random access.

I'm sure eventually SSDs will be cheap enough that HDD will go the way of the floppy but we're not there yet.

paavoova · on Oct 30, 2018

You can't use flash storage for reliable unpowered archival. It degrades (gates leak electrons) over time, unlike magnetic storage. This is also unpredictable, as reliability depends on both operating and power-off temperatures, as well as existing wear. See: https://www.anandtech.com/show/9248/the-truth-about-ssd-data...

aidenn0 · on Oct 30, 2018

Is tape even cheaper than spinning rust? Last time I priced it, the per-GB costs were similar and the tape drives themselves are quite expensive. Tape is a more reliable backup, since the moving parts are not part of the storage, but it's not a cheaper backup.

teraflop · on Oct 30, 2018

Depends on your volume. I don't know how much IT departments typically pay, but on Amazon, hard drives start at about $20-25/TB, whereas LTO-7 tapes are about $10-12/TB. That's a significant savings, but you need to be storing at least a few hundred TB before you recoup the cost of the tape drive.

gamegoblin · on Oct 30, 2018

I know Amazon prices aren't exactly perfect for enterprise storage costs, but they imply tape is at least 4x cheaper (assuming you are using enough to amortize the cost of the tape drive):

6.25TB tape is $28 : https://www.amazon.com/HP-HEWC7976A-Ultrium-6-25TB-Cartridge...

6TB HDD is $120 : https://www.amazon.com/Seagate-Expansion-Desktop-External-ST...

EDIT: Commenter below points out this 6.25TB tape is actualy 2.5TB physical -- still cheaper, in terms of $/GB, but not the 4x I mention above -- closer to 2x.

klodolph · on Oct 30, 2018

You've got the wrong info, LTO-7 is 6TB and $67: https://www.amazon.com/Fuji-Ultrium-7-Data-Cartridge-1645657...

LTO-6 (what you linked) is not 6.25TB, it’s 2.5TB, despite what Amazon says.

Then add the operational costs, which is the hard part, because the operational costs for a tape are very different from the operational costs for a hard disk.

teraflop · on Oct 30, 2018

That's an apples-to-oranges comparison. The "6.25TB" number for LTO tapes is assuming a fairly arbitrary 2.5:1 lossless compression ratio. The actual data capacity of the tape is only 2.5TB.

gamegoblin · on Oct 30, 2018

You're right. Though, that's still 2x cheaper in terms of $/GB.

simias · on Oct 30, 2018

You might well be right, I haven't bought tape in a long time.

jankeymeulen · on Oct 30, 2018

You can have an amount of tapes as large as you wish to a single drive, meaning the cost of the tape drive becomes irrelevant after a certain amount of tapes.