Intel Publishes Spectre and Meltdown Hardware Plans: Fixed Gear Later This Year

daxfohl · on March 15, 2018

Would a hardware fix also eliminate the perf degradation that the software fixes introduced? Or does it just increase security?

nhaehnle · on March 15, 2018

I'd say both Spectre 2 and Meltdown are fairly straightforward to fix in hardware without performance degradation (though probably at the cost of a bit more circuitry), so that's most likely what they did.

For Meltdown, you need to just never load data from L1$ when permissions checks fail. Since address translation is already on the L1$, doing permissions checks there as well is not a big burden. This is quite natural to do, and also why Meltdown specifically affected Intel, and not everybody.

For Spectre 2, you need to add some bits to the branch target buffer to tag entries by the privilege level and ideally PCID (to protect user level processes from each other). And then you just don't use BTB entries with the wrong tag.

It'd still be interesting to know the details, e.g. what exactly they changed for Spectre 2.

bzbarsky · on March 16, 2018

Your proposal doesn't fix Spectre 2 in general; just the cross-privilege level (or at best cross-process) aspect. Anyone trying to create a sandbox still has to worry about the sandboxed stuff being able to read all of the sandbox memory, for example.

nhaehnle · on March 17, 2018

That's a good point.

Ideally the ISA would allow user level code to also toggle a flag that affects the BTB. That would allow embedding JIT code while guarding against Spectre 2 (fixing the BTB injection).

I don't think you really need more than that though. Reading arbitrary memory would seem to require a Spectre 1-type attack instead. Unless you have an explicit counterexample in mind?

For the JIT'ed code, Spectre 1 can probably be mostly avoided using the bitmask trick or something equivalent, which we seem to be moving in the direction of anyway (thinking about the WASM execution environment specifically), and for the "host" code inside the sandbox, the same kind of selective software fixes are required as in the kernel.

bzbarsky · on March 21, 2018

I suspect you can get arbitrary memory read with Spectre 2 as well; just have to find the right gadgets. Which in a big program of the sort that would have a sandbox inside is somewhat likely...

(I can't comment on specific counterexamples in more detail than that, even if I know of any.)

nhaehnle · on March 21, 2018

I don't see how that could be.

Spectre 2 is about code in a less trusted domain A introducing an otherwise impossible branch target used by branch prediction on code in a trusted domain B. So a solution that adds tag bits to the BTB to distinguish between those domains should be a 100% effective fix. The only arguments are about how many tag bits & who gets to set them, i.e. a user-mode JIT sandbox would want to be able to control at least one tag bit for this, but if it can, it'd be fully effective.

Unless you start thinking about protecting different JITted code domains in the same address space from each other... I guess at some point you just have to bite the bullet and invalidate the BTB.

I suppose an additional problem would be if the BTB just "hallucinated" random indirect branch target destinations somehow, which could allow untrusted domain A code to jump (in speculation) to code that would normally be unreachable. A kind of "reverse Spectre 2", if you will, although the same tag approach should fix that at the same time as the original Spectre 2.

Spectre 1 attacks from JIT'ed code against the outside code are definitely possible though.

bzbarsky · on March 21, 2018

> The only arguments are about how many tag bits & who gets to set them

That's fair. I guess I've seen no indication that bits will be provided by ISAs and OSes to userland (past the bit that distinguishes userland from kernel code)...

> Unless you start thinking about protecting different > JITted code domains in the same address space from each > other

That's also fair. As a browser developer, this is my world right now. ;)

mlindner · on March 15, 2018

It should (depending on how they implement it), as long as the software is re-modified to run the original way on only this platform and later. The performance degradation at least would move into hardware instead of software.

samfisher83 · on March 15, 2018

How are they going to fix it without some performance penalty. If the memory is in cache isn't there going to be a timing difference? or an extra check will have to be made which is going to have to slow down the lookup?

swsieber · on March 15, 2018

From what I underst and this fixes variants two and three which are cross ring and cross process data extraction. I believe their hardware mitigation is to separate the caches between those (or something to that effect). So there will still be a minor penalty hit for running multiple identical process compared to before.

That's my interpretation though, I don't know enough to say that's the correct interpretation.

Someone1234 · on March 15, 2018

Does the hardware on x86 even know which cache is owned by which process? It knows which is elevated (kernel Vs. User Vs. Hypervisor, etc) but process isolation on x86 is largely a kernel API construct rather than a hardware one.

Unless I'm mistaken.

rzzzt · on March 15, 2018

The answer is yes, if the processor supports PCID: https://news.ycombinator.com/item?id=16094349

  Simplistically, it allows TLB-cached page table contents to be tagged with a 
  context identifier, and limits the lookups in the TLB to only match within 
  the currently allowed context. TLB cached entires with a different PCID will 
  be ignored.

hansendc · on March 15, 2018

I think you are confusing the TLB and processor data caches.

PCIDs exist for virtual addresses and are implemented in the TLB, not the processor caches. The processor caches are physically indexed, not virtually indexed.

So, does the processor cache have a mechanism to tell which "process" to which a given line belongs? Nope. Two processes (or a process and the kernel) that share memory can be in two PCIDs, but share the processor caches.

rzzzt · on March 16, 2018

You are correct. My assumption was that the original question is not focused on the data cache itself, and so the DTLB does qualify for "knowing" what belongs to which context.

PeterisP · on March 15, 2018

An additional check can be made in the same time in parallel if there's extra hardware (specific gates) assigned to do that.

bitL · on March 15, 2018

> to fix it without some performance penalty.

10nm + higher frequency?

aeleos · on March 15, 2018

We can't really know, since we don't have something to compare it to. Only they know what the performance could have been if the fix were not implemented in silicon, and they won't be releasing new versions of old chips that ONLY have this hardware fix, so we will never be able to compare the two. Only they will ever know, however if we see comparable or worse performance (or just barely better) in these new chips then we can know that there was some performance hit.

nonbel · on March 15, 2018

From the intel press release:

>"As we bring these new products to market, ensuring that they deliver the performance improvements people expect from us is critical. Our goal is to offer not only the best performance, but also the best secure performance." https://newsroom.intel.com/editorials/advancing-security-sil...

dredmorbius · on March 15, 2018

That's a non-answer answer.

It doesn't say there will be no degradation. Only that that might be minimised.

sintaxi · on March 15, 2018

Isn't it clear? Performance is top priority and security is also top priority.

edit: apparently sarcasm on the web doesn't work very well.

ComputerGuru · on March 16, 2018

It’s pretty well known that when you say you have multiple top priorities, what you really mean to say is that marketing is your top priority.

notriddle · on March 15, 2018

And reliability is also top priority. And minimizing cost is also top priority. [/sarcasm]

dredmorbius · on March 15, 2018

...and an almost fanatical devotion to the Pope, and nice red uniforms.

dredmorbius · on March 15, 2018

It (sarcasm) never has, though, yes.

derefr · on March 15, 2018

They probably aren't sure yet.

dredmorbius · on March 15, 2018

That thought has also occurred. Intel could have said as much, but did not.

wilun · on March 16, 2018

It's doubtful Intel would delay plans too much to work on such subject, and its even more doubtful the result would even be better for anybody: we would have to use the old completely unpatched chips for longer...

It's better for everybody to have more frequent gradual fixes, even if the firsts are not 100% complete.

classics2 · on March 15, 2018

Only in that you don’t be able to directly measure it anymore because the mitigation is buried in the performance data of a new processor.

sitepodmatt · on March 16, 2018

Would this result in 32bit, x64, x64CPUpost2018 versions of software, as from my limited understanding apps (such as v8 engine) are being compiled with workarounds for such that won't be needed once fixed in hardware?

Slansitartop · on March 15, 2018

I'm glad their going to be releasing microcode fixes for older processors, mainly because my hardware still uses one:

> Finally, Intel will also be going even further back with their microcode updates. Their latest schedule calls for processors as old as the Core 2 lineup to get updates, including the 1st gen Core processors (Nehalem/Gulftown/Westmere/Lynnfield/Clarksfield/Bloomfield/Arrandale/Clarkdale), and the 45nm Core 2 processors (Penryn/Yorkfield/Wolfdale/Hapertown), which would cover most Intel processors going back to late 2007 or so.

lathiat · on March 15, 2018

I think it's pretty great that they can actually develop mitigations partly in microcode and they can still do it that far back.

modeless · on March 15, 2018

I'm still waiting to hear if there's ever going to be a hardware mitigation for Spectre variant 1. It's not accurate to call things "fixed" without it. Is anyone even working on it?

hansendc · on March 15, 2018

Disclaimer: I work on software (Linux) at Intel.

Spectre variant 1 is arguably a software problem. It's not about consuming "bad" data that was placed by an adversary like variant 2, or allowing a rogue data cache loads like with Meltdown. It happens as the result of normal execution.

The processor manuals are also very clear that speculation can be wrong and can have observable side-effects[1][2]. The processors also have ways to constrain speculation architecturally (like lfence) in the case that speculation can do something that results in exposure to a vulnerability. Those two things put together make it a software problem _now_, and perhaps forever.

This means that those of us who write security-sensitive software have a long road ahead of us to learn how this affects us and to continue to enhance our defenses. Adding array_index_nospec()'s in Linux, for instance.

1. Intel SDM Vol 3a, 4.10.2.3 Details of TLB Use: "The processor may cache translations required ... for accesses that are a result of speculative execution that would never actually occur in the executed code path." 2. CLFLUSH instruction definition says: "processors are free to speculatively fetch and cache data from system memory regions assigned a memory-type allowing for speculative reads"

modeless · on March 16, 2018

This is exactly my fear, that Intel and others will continue to go with the "working as designed" defense. The problem still exists. The design is bad, and users will pay the price with slower software, or less secure software, until the design is fixed.

wilun · on March 16, 2018

Disclaimer; I don't work at Intel, nor in any other processor design house, nor in a fab, etc.

And honestly, look at what is variant 1, before spreading uninformed opinion. It is fundamental to speculative execution. You can have some form of hardware support, and actually both Intel, AMD, ARM, and probably others, have begun some work on the subject, but it is in the form of instructions that can be added or replaced in the software at the right spot, because there is absolutely no way to have a speculative enough HW enforce security for variant 1 without some software help: it is not an architectural or a microarchitectural problem; it is a fundamental logic problem...

My opinion is that we need extensible tagging in programing languages quickly, for that, and other subjects.

modeless · on March 16, 2018

I understand variant 1, and I do not share your pessimism. It is not fundamental to speculative execution as a concept, only current implementations.

Speculative execution in modern processors already requires performing heroic feats to enforce correctness in the face of mispredicted branches. Fixing Spectre variant 1 requires more heroic feats of a similar kind (rolling back state changes), and I don't think it is fundamentally impossible given what has already been done.

It won't be easy, and it won't be free, but it needs doing. Nobody thought it was worth doing until now, so nobody tried, but that doesn't indicate that it's impossible.

Scaevolus · on March 15, 2018

Spectre variant 1 is only really a problem for JITs, and it's mitigated with masking. Disabling speculation to defeat variant 1 is probably not worth the performance cost.

hansendc · on March 15, 2018

Disclaimer: I work on software (Linux) at Intel.

There are certainly non-JIT-related mitigations out there.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

givinguflac · on March 15, 2018

I think I’ll wait for 9th gen to see how this all shakes out.

frou_dh · on March 15, 2018

For those of use that keep computers for many years, these flaws certainly put a dampener on new sales. No way I'd buy a brand new computer today with a known flawed/suboptimal CPU, and be lumbered with that for the next 5-10 years.

dataflow · on March 15, 2018

> No way I'd buy a brand new computer today with a known flawed/suboptimal CPU

I think at some point you have to give up on the notion that most things will (or even can) ever be perfectly optimal. Whether it's performance, security, reliability, or anything else, everything is flawed in one way or another... it's just a question of how relevant the flaws are for your use cases. In this case I'm not too worried until I hear about an actual Spectre-based attack carried out against a random user in the wild.

frou_dh · on March 15, 2018

All I'm saying is that for a certain set of potential customers, this tips the balance from spending money to not spending money. A handful of people inside Intel and computer OEMs are probably spitting feathers about this circumstance. At the present time, I feel no pressure to accept anything.

perl4ever · on March 16, 2018

I'm going to be worried until I hear about a way to detect an actual Spectre-based attack carried out against a random user in the wild...

tedunangst · on March 16, 2018

Random users in the wild are unlikely to recognize that they've been attacked after the fact.

sundvor · on March 15, 2018

Yep right there with you. Was hoping to upgrade my OG X1 Carbon laptop this year, now I'm pushing it back another year at least.

sintaxi · on March 15, 2018

I suspect CPUs have been flaky for some time. We are only hearing about this because there is finally competition in the space again.

zaarn · on March 15, 2018

*For the datacenter.

There doesn't seem to be a concrete plan for consumer chips.

kikoreis · on March 15, 2018

Actually they say 8th generation Core.

willvarfar · on March 15, 2018

The article says

> "Meanwhile for updating Intel’s consumer chips, this is a bit more nebulous"

And then says why it's still ambiguous where that leaves non-DC users

tarlinian · on March 15, 2018

I'm not exactly sure what's more "nebulous" about the timing for consumer chips. This timing is specified for both the consumer and server chips in the same sentence in the press release: "These changes will begin with our next-generation Intel® Xeon® Scalable processors (code-named Cascade Lake) as well as 8th Generation Intel® Core™ processors expected to ship in the second half of 2018." (I guess the nebulous thing is exactly which 8th generation Cores will have the update.)

wmf · on March 15, 2018

I guess the nebulous thing is exactly which 8th generation Cores will have the update.

Indeed. Plenty of 8th gen processors have already shipped so it won't be easy to tell which ones are fixed. Also, 8th gen includes Kaby Lake-R, Coffee Lake, Cannonlake, and maybe Whiskey Lake; I doubt all of those will be fixed. And I was expecting Intel to be selling 9th gen in 2H2018.

wilun · on March 16, 2018

> Also, 8th gen includes Kaby Lake-R, Coffee Lake, Cannonlake, and maybe Whiskey Lake

hm what are they smoking exactly? I want some! :p

neuronflux · on March 15, 2018

I guess there is nothing that can be done about existing processors.

wila · on March 16, 2018

Perhaps a bit of a dark thought, but fixing existing processors means you might not buy their new 'fixed' processor.

As a result they are not that motivated to fix it as well as they should.

perl4ever · on March 16, 2018

I am reading about microcode updates, but I'm wondering if Dell doesn't update the BIOS for my (early Core i3, admittedly pretty old) computer am I SOL regardless of what Intel does?

wila · on March 22, 2018

The microcode updates can also be loaded via your OS if they are not installed via a firmware update. This happens before the OS itself is loaded early on in the boot stage.

See [0] for Linux and possibly [1] for Windows. Not 100% sure if Windows does handle this via Windows Updates or not.

[0] https://wiki.gentoo.org/wiki/Intel_microcode

[1] http://www.overclock.net/forum/5-intel-cpus/1633419-how-upda...

wyldfire · on March 15, 2018

Gee, too bad. While their datacenter customers take security very seriously, their ordinary consumers probably run lots more unsigned code (JS, etc).

jerf · on March 15, 2018

One of the most dangerous practical manifestations of this is in datacenters, though, where you use it to go poking around in other people's VMs.

On the consumer desktop/laptop, there's many other vectors a lot easier to use already, not least of which is plain old social engineering. Right now an in-the-wild virus that tried to use these vulnerabilities would strike me as somebody showboating more than a serious attempt to exploit systems with these methods.

ISL · on March 15, 2018

When it's working for the datacenter, it'd be natural for the new designs to rattle down into consumer devices.

zaarn · on March 16, 2018

Naturally it goes the other way round: consumer (high end) chips receive new designs first and datacenters get them once they're somewhat battletested.

yalogin · on March 15, 2018

Wat exactly is the hardware fix? Is that detailed somewhere?

narrator · on March 16, 2018

Got to have something to sell in case Canon Lake gets delayed again...

magoghm · on March 15, 2018

That was faster than I expected.

hinkley · on March 15, 2018

Is it? They’ve known about the problem for a while now.

sambe · on March 15, 2018

Yes. Most reports placed time-to-available-hardware in the low years (a whole dev cycle). Perhaps those were early sensationalist reports (this article would suggest they were!), but it is a surprise if you only read them.

bri3d · on March 16, 2018

Most of the long-horizon reports that I read centered around Spectre 1 (intra-process OOB reads by racing bounds check against conditional branch prediction, currently mitigated by lfence), which is still a fundamental-principles issue that would be exceptionally challenging to solve without either substantial performance loss or some form of userland/compilation-level workaround. Intel still haven't indicated a hardware fix for that one. I expect that most mitigations for this will center around improving static analysis tooling and giving the compiler finer-grained control around fencing to reduce performance impact.

Variant 2 (branch target injection, retpoline style software mitigation) is the most interesting. There are certainly obvious theoretical hardware mitigation strategies like tagged BTB entries and this is the most interesting hardware fix. To me it's very cool but not particularly surprising that Intel believe they'll have pure-hardware mitigations for that soon.

And Meltdown was frankly an Intel mistake that would have been shocking if left unfixed in the next hardware revision.

BLKNSLVR · on March 16, 2018

Well, it was the day after the AMD news release, so it's timed pretty well.

"Our goal is to offer not only the best performance, but also the best secure performance."

Jab!

qarioz · on March 16, 2018

What a lovely coincidence.