I'm left thinking if the extension mechanisms in NTFS can be used to implement a better compression scheme with support from the application level (instead of filesystem level).
Or if Microsoft could sit and define a new version of NTFS in which they change LZNT to LZNT2 with better compression ratios and different requirements for modern systems :)
The justification that hard drives need to work everywhere is a little weird. Most drives stay in their host machine until they die. Requiring some recent version of Windows to read the files wouldn't be a hardship. A little checkbox that saves you X% storage space and Y% CPU time in exchange for such a requirement would be useful.
> The justification that hard drives need to work everywhere is a little weird.
I think it's a pure anachronism at this point: it makes more sense to me in the era where small drive sizes meant a lot of professional users relied on external drives and “just upload it” wasn't going to fly with a 9600bps modem, not to mention hardware costing more and failing increasing the need to move drives between machines simply to get back in service.
> Most drives stay in their host machine until they die.
Or until the host machine dies, in which case they have to be readable in another machine.
(Happened to me recently: the laptop's motherboard stopped working, I used its disk with an external case and an older operating system version while I waited for the warranty replacement.)
> The justification that hard drives need to work everywhere is a little weird. Most drives stay in their host machine until they die.
The article was pretty clear that the context they had in mind for this requirement was servers in a data center, not your home machine:
> Without that requirement, a hard drive might be usable only on the system that created it, which would create a major obstacle for data centers (not to mention data recovery).
Keep in mind they still thought they'd be targeting Alpha processors as late as the Win2K RC's.
How is that a major concern for data centers? I thought drives typically stayed in a data center machine until they died too. And even if you were swapping them around, you'd only have a requirement for a certain newer OS, not the exact same physical hardware.
Drives ideally stay in a server machines until they die in a data center, and maybe even typically.. but MS could hardly only take into consideration what was 'typical' when it comes to things like this. And he's writing about engineering decisions made over 15 years ago here - around 1998 it certainly was more common to move hard drives around between machines. I probably did this at least 100 times just working for one company for a couple of years.
That said, it would still be one hell of a weird edge case to need to take a drive out of an x86 Win2k server, drop it into an Alpha Win2k server, and still care about its contents (vs wiping it for a newly provisioned host). But when you are writing OS filesystems, you have to care about edge cases... especially edge cases that may apply to thousands of racks worth of machines.
> it would still be one hell of a weird edge case to need to take a drive out of an x86 Win2k server, drop it into an Alpha Win2k server
I can imagine the reverse being more common though - I worked at a shop around that time where we had a handful of very expensive Alpha NT 4 (and VMS and...) boxes, and a lot of x86 NT boxes. I could imagine the magic smoke being let out of an Alpha and having to drop the drive into an x86 box for data recovery.
My point is that the way drives are typically used means that the original decision doesn't have to be set in stone. A new format could be added, one which requires a recent version of Windows to read and which therefore can take advantage of recent hardware, and that would work just fine.
I'm curious why they haven't moved compression to the disk hardware instead of leaving it up to the operating system. Is it because you only get decent compression if you know the context of the whole file, or is it maybe that the hard drive companies wouldn't be able to market a device like that?
Sandforce SSDs did implement compression, this allowed them to write and read less to the flash boosting performance. But the SSD still reported the full uncompressed size to the OS. This is because the abstraction that storage devices present to the OS is block and the device can't present a varying amount of total blocks on the device depending on the data written to the drive.
Even fastest compression algorithms like LZ4 need Core i5-4300U @1.9GHz in order to get 385MB/s [1]. You'd need a pretty powerful setup to have it keep up with SSD speed and you also need to be mindful of heat generated. Also, it would be pretty useless if the volume's encrypted.
Wait. Aren't the numbers for SSDs, particularly the bus speeds for the SATA connection measured in Gigabits (not bytes)?
Looking at these numbers you linked to (which seem to be in megabytes, it seems to me that decompression speeds could keep up. And I know write speeds on SSDs are a lot slower than the spec'd numbers, so the compression write speeds look plausible to me too.
The general idea is that modern CPUs tend to be instruction starved and sit idle because they are waiting on the slow memory buses that connect everything.
There was a window there where HDD performance was marginal enough that compressing data helped fetch it from disk faster since CPU decompression was quicker than waiting for the data to be fetched at full-size, especially on things like text where 10:1 compression isn't hard.
Now we're living with SSDs that can do 2GB/s and no CPU can decompress that quickly.
> Now we're living with SSDs that can do 2GB/s and no CPU can decompress that quickly.
I'd totally buy a machine with a state-of-the-art CPU paired with one or two FPGAs that can be programmed as accelerators for crypto, compression, etc.
Intel's working on bundling FGPA with its Xeon systems, so maybe that will happen, but it's probably better addressed with a hardware decoder like is done for H.264.
Or if Microsoft could sit and define a new version of NTFS in which they change LZNT to LZNT2 with better compression ratios and different requirements for modern systems :)