Zen 2 Missives – AMD now delivering efficiencies that are double that of Intel

guardiangod · on Aug 26, 2019

I am surprised the article doesn't talk about Intel CPU performance degradation due to speculative timing attack workarounds. The bugs are more severe on Intel CPUs and as such, Intel CPUs took a much bigger performance hit.

wilhil · on Aug 26, 2019

Here here...

This is a hidden thing that seems to always go under the radar.

My Intel i7 4790k has lost at least 40% speed since I originally bought it, and for years, I always thought that Intel machines were getting slower without any actual proof whereas AMD machine I returned to after years just felt as fast as ever.

Now with microcode updates and similar getting more coverage, I'm certain that this is what it is.

AMD's partner program is meh, where as Intel is really partner focused - if anything breaks, I can get a replacement shipped to me in advanced the next day for ~3-5 years (component depending), but I am seeing much more demand for AMD as of late.

sundvor · on Aug 26, 2019

Yeah they silently killed my overclock 9 months ago by limiting the 6850k to 38x multi due to "Spectre Mitigation". It's the worst BS ever...

If I remove the Intel microservice dll from my windows system folder then I get my 41x back, but there's been no documentation forthcoming. Just finding out why speeds were nerfed required lots of research.

Having said that I can see that Asus may have finally released the required motherboard update for this. Updating requires me to reset everything by hand, as Asus doesn't let me simply back up settings - they are invalidated between bios versions. Between that and eating CPUs if using XMP, it's the worst motherboard I ever bought.

https://www.asus.com/us/Motherboards/ROG-STRIX-X99-GAMING/He...

Needless to say my days of building Intel systems are over.

qplex · on Aug 26, 2019

I had a system boot loop on a Intel core-i system because a microcode update "disallowed" overclocking (this happened well before the spectre/meltdown debacle).

The only solution was to prevent the os from loading the bad microcode at boot - I'm guessing that's what removing those .dlls does.

op00to · on Aug 26, 2019

Eating CPUs?

sundvor · on Aug 26, 2019

The automatic XMP settings upped one of the more obscure voltage lines (new to me since my last CPU the 2600k, which still runs perfectly on 4.4ghz as son's PC) to dangerously high levels. I received two CPU replacements under warranty, before I found the particular setting that needed to be put under manual control. I think I was 1/3rd of my way to the third CPU dying.

This was incredibly sloppy work by Asus.

So yeah, eating CPUs.

sundvor · on Aug 26, 2019

I see I got a few upvotes, so this must resonate here - thanks, wanted to back this up with specific details:

The chipset was the Broadwell EP. VCCSA ran at ~1.33, when the maximum should have been 1.25. Also VCCIO was 1.25, when it'd run happily at 1.096v.

Point taken was to go through every single voltage setting and research safe ranges, not trust auto for _anything_. There were changes in the CPU memory controllers around this time, and that's what caused the CPU to die when unsafe voltages were applied; Asus' XMP settings for my 4x8=32GB 3200 Trident Z DDR4s (17-18-18-36) turned them to mush.

I don't have a degree in electrical engineering; it seems like Asus wanted you to have one.

Looking back I'm discovering more details... and another guy who experienced essentially the same, with others backing up the instability of the platform: https://forums.anandtech.com/threads/my-6850x-just-died-just...

Whilst I want to move over to AMD, I'd like my current 6850k to last longer as my son's computer. It seems like I won the silicon lottery with my old 2600k that he's currently using though; running fixed 4.4ghz which is a very nice upgrade from the stock 3.4ghz.

The original article in this thread is music to my ears - for AM4, just add a great cooler and away we go. It's tempting to upgrade both computers at the same time.. which kind of wrecks my hand me down thing, but I'm sure the 6850k got degraded by 3 weeks on the bad settings as I've had lots of issues with it (unlike the 2600k).

(Final edit: Thus my disdain for Asus' inability to back up BIOS settings on this board when performing BIOS updates like I should now to get my overclock back, unlike others I've had; if I miss even a single one of the voltage settings, I can expect it to fry the CPU - unless they actually fixed their sh*t in one of the BIOS updates).

Grimm665 · on Aug 27, 2019

My Strix X99 just kicked the bucket a week ago...is this possibly why? Shit, I've had it on XMP since I built it 3 years ago.

I've already gone through one CPU, about 4 months after I built it. Now the whole machine is just dead with no signs of life. I bought a new board assuming the board was bad but maybe I need to test the CPU now...

Agreed though, one of the worst boards I've ever bought and seriously rethinking all Asus purchases in the future.

f00zz · on Aug 26, 2019

Yeah, builds on my i7 Linux laptop are noticeably faster with Spectre mitigations disabled.

Pretty sad that mitigations against a largely theoretical attack hurt performance for everyone. Meanwhile people continue to open e-mail attachments with .scr extensions.

hajile · on Aug 26, 2019

It's only theoretical because the majority of people are patched. Given that the possible vulnerability has been discussed since core was first released, I'd bet that certain 3-letter organizations have used this and similar exploits in the past against high-value systems.

f00zz · on Aug 26, 2019

My (perhaps naive) understanding is that, for Spectre to be truly exploitable, an attacker must be able to execute code locally (I know of the JS POC but there are browser mitigations). But if an attacker has an account in my system I'm already screwed anyway, as there are probably other privilege escalation vectors that are easier to exploit.

My laptop is not high-value so I'd rather just disable that stuff. I once timed a C++ build, with and without Spectre mitigations, and the difference was something like 20%.

hajile · on Aug 26, 2019

Privilege escalation is the biggest issue I think. Users on modern systems don't actually run as admins by default which makes it much harder for malicious software to compromise the entire system.

Consider the average user. You can remove almost all malware and viruses because they only install and run with user permissions. Without spectre mitigations, it wouldn't be very long before all that malware was installing itself at deeper levels where the only solution is blowing everything away completely.

I'm sure there is also a DRM argument somewhere for people who like that kind of thing.

smueller1234 · on Aug 26, 2019

Javascript runs locally. Eg. a quick search turns up: https://github.com/cgvwzq/spectre

account42 · on Aug 28, 2019

Relevant: make-linux-fast-again.com

f00zz · on Aug 28, 2019

Nice, thanks!

ljcn · on Aug 26, 2019

'Hear hear'

wilhil · on Sept 4, 2019

:P Thanks!

TeMPOraL · on Aug 26, 2019

Huh, I was under the impression that mitigation of speculative execution vulnerabilities was a one-time fix that has been implemented, accounted for and forgotten years ago. I'm surprised to hear it's still a factor in anything.

jjuhl · on Aug 26, 2019

New ones have been discovered regularly (more or less one every 2-4 months). Some affect AMD as well, but all affect Intel.

jotm · on Aug 26, 2019

If only... There's been several BIOS updates for my laptops. Meltdown, Spectre, and the most recent one, again something to do with side attacks. Plus Microsoft's updates

NullPrefix · on Aug 27, 2019

>if anything breaks, I can get a replacement shipped to me in advanced the next day for ~3-5 years (component depending)

40% speed reduction does not really count as "anything breaks", does it? Some previously working workloads stopped working.

pingyong · on Aug 26, 2019

40%?? In what workload?

qes · on Aug 26, 2019

I share your incredulity. 40% sounds like total BS unless you designed benchmarks to maximally highlight potential speed loss from Spectre & Meltdown mitigations.

chx · on Aug 26, 2019

Chiplets is what gives AMD an absolutely brutal advantage. Their high end chips do not need expensive large dies -- just a few small ones. Yields are much better. And they can bin each chiplet separately. Oh and they don't spend the expensive top notch process on the I/O part of the CPU either. Intel might be hard pressed to catch up to this -- sure the 7nm EUV process in two years and a bit will very likely be a serious jump in IPC but if you are comparing similarly priced server CPUs then even that is very likely to be simply not enough due to this chiplet strategy. For the foreseeable future, inertia alone is the only reason for anyone to buy an Intel server chip.

cc439 · on Aug 26, 2019

So the chiplet strategy is clearly paying dividends for AMD but I'm curious as to what has changed to have allowed this idea to be so effective? It's not like the concept of multi-die CPUs is new, Intel even implemented this on their legendary Q6600 CPU which was basically 2 Core 2 Duo dies on a single chip. The issue with the approach used with the Q6600 was that communication across cores on separate dies was orders of magnitude slower than communication across the cores sharing a die. Is AMD's success down to recent advancements in brand prediction and core scheduling optimization?

Dylan16807 · on Aug 26, 2019

> The issue with the approach used with the Q6600 was that communication across cores on separate dies was orders of magnitude slower than communication across the cores sharing a die. Is AMD's success down to recent advancements in brand prediction and core scheduling optimization?

The biggest problem with the Q6600 is that it was a big hack to communicate over the frontside bus, something designed for nice slow memory access that didn't have the bandwidth or latency for inter-core communication.

Infinity fabric, on the other hand, is good enough that on Zen 2 they didn't even bother to directly connect the two CCX that share a die.

nsteel · on Aug 26, 2019

But once they do get on the chiplet train properly, their EMIB technology means they only need a small 'bridge' piece of silicon to connect their chiplets. In contrast to what TSMC offer where all the chiplets need to sit within a large silicon interposer. This is more expensive and it limits the size of the chip you can produce (as there is a limit to the interposer size). They also announced that Foveros stacking thing and now co-EMIB too; on paper Intel's new packaging tech looks like it could really help them. They just need to make use of it.

wmf · on Aug 26, 2019

Some chiplets don't need an interposer or EMIB (we used to call this MCM back in the day but whatever).

nsteel · on Aug 26, 2019

When one of the chiplets is responsible for IO and the others all need to talk to it, like AMD are doing, they do. I guess you could go down through the package and back up again like the olden days if you wanted a much worse solution.

wmf · on Aug 26, 2019

Please look at an AMD processor before digging yourself deeper.

TazeTSchnitzel · on Aug 26, 2019

Intel announced a 56-core server CPU (presumably purely to try to keep up with AMD), which is just two 14nm 28-core dies “glued” together like Zen 1. But AMD clearly has an lead on getting this to market and optimising it into their current chiplets.

spamizbad · on Aug 26, 2019

The craziest thing about that 56-core CPU is its 400 watt TDP. Meanwhile 64-core Zen2 EPYCs top out at 225 watts.

c2h5oh · on Aug 26, 2019

Not the rumored 50k price tag for the -M variant that doesn't have max memory artificially limited?

spamizbad · on Aug 26, 2019

I had no idea. That's... uh... why?

I guess part of this nonsense is the consolidation of cloud vendors. Intel is selling CPUs the way IBM sold mainframes in the 1960s.

wmf · on Aug 26, 2019

If an uncrippled 28-core is $17K then putting two of those in one package is going to be over $35K.

Dylan16807 · on Aug 26, 2019

Unlocking the memory capacity is a flat fee regardless of how many cores, so I think a better comparison price for the overcosting is $20k locked / $28k unlocked.

theevilsharpie · on Aug 26, 2019

Intel has had a 56-core Xeon for a while, but there's been no essentially no interest from OEMs.

I don't think you can even get it outside of custom-built HPC machines.

wtallis · on Aug 26, 2019

The dual-die 56-core CPU isn't socketed; it's a massive BGA-soldered package.

cududa · on Aug 26, 2019

Back in 2007 at Microsoft I saw an 80-core Intel prototype. Didn’t really do much of anything, and there was only 1 of them, but it’s interesting seeing some of these old research projects come to fruition

rasz · on Aug 26, 2019

that would be https://en.wikipedia.org/wiki/Teraflops_Research_Chip , looks like Intels take on Transputer

cududa · on Aug 27, 2019

Thank you!

jorvi · on Aug 26, 2019

> Intel might be hard pressed to catch up to this

We were constantly saying this about AMD until Zen, and Zen is largely credited to Jim Keller. Where did Jim Keller go after Zen? Intel. I'm lowkey afraid that AMD will run out of steam after one or two gens and Intel+Keller will have just finished developing an insane architecture that brings us right back to the pre-Zen era.

ksec · on Aug 26, 2019

>Where did Jim Keller go after Zen? Intel.

He didn't went straight to Intel, You missed Tesla in between. And anything he is doing will be 2022+ at the earliest. Intel already has a roadmap of Chips that were delayed by Fabs, and that is at least up to 2022 / 2023.

I am pretty sure AMD has many competent people too. And as far as I could tell, I don't ever see engineering talent as a problem in any big tech companies. There are lots of insanely great engineers from Microsoft and even Oracle that most have not heard of, but most of the talents are hampered by cooperate politics and culture. And so far AMD seems to have a far better culture under Dr Lisa Su than Intel.

basilgohar · on Aug 26, 2019

It's a cycle we've seen before. Just enjoy the ride and take the benefit while it's there. Hopefully things are different this time, but many of us remember when AMD beat Intel to 1GHz and also the major win that was amd64 and the overclocking madness that came with Barton. This industry is cyclical in nature. I am an AMD fanboi from the K6-2 days, so I am a restrained optimist about the current wins.

tandr · on Aug 27, 2019

> This industry is cyclical in nature.

I mis-read it as "cynical in nature", and it still fits. May the best process win.

karpodiem · on Aug 26, 2019

People who know these things have said that Zen was principally designed by a different individual, not Keller. I'm trying to find the link/who that person is.

bri3d · on Aug 26, 2019

Mike Clark. I have no inside information but the rumor I have read is that Keller was mostly responsible for the cancelled AMD ARM effort while Clark did Zen. Clark is explicitly credited as the architect/principal designer for Zen 2.

ahartmetz · on Aug 26, 2019

Mike Clark (~tech lead) and Suzanne Plummer (~engineering group lead) are usually mentioned as the most important people.

pingyong · on Aug 26, 2019

I'd really like that despite Intel being in the driver's seat again - because I honestly don't see where single core performance is going to come from in the future. IPC increased because the CPUs used the inherent possible parallelism in normal code where often calculations don't depend on the value of the previous calculation. But now it seems to me like we're pretty much at the limit of that, adding more integer units/prefetchers/etc. isn't going to give us the growth we're used to. Cache is something with a decent amount of potential, but once that is gone, what are they going to do?

ChuckNorris89 · on Aug 26, 2019

Well that's how competition works.

novaRom · on Aug 26, 2019

Is it the future of SoCs?

hajile · on Aug 26, 2019

Definitely. Intel has said they're going chiplet with some talk about 3D chips too.

https://www.anandtech.com/show/14211/intels-interconnected-f...

Hotchips is full of chiplet talk. TSMC specifically calls out AMD, Xilinx, and Nvidia when talking about chiplets.

https://www.anandtech.com/show/14770/hot-chips-31-keynote-da...

Nvidia's Hotchips highlight is an AI accelerator with 36 chiplets on board (and interestingly, with a RISC-V controller -- probably the one they've been putting in their graphics cards).

https://www.anandtech.com/show/14767/hot-chips-31-live-blogs...

A chip can't bin faster than its slowest core. Getting 28 cores that run at top speed is almost impossible (even with Intel's super-stable 14nm and many chip design iterations optimized for the process). In contrast, getting 8 cores at a high speed happens much more often.

Likewise, a given manufacturing process is going to have an average of N errors per wafer. Duplication of parts of the chip can help, but they increase cost and some parts simply aren't economical to duplicate. If the error is in cache, you can probably laser off a cache block and move on. If the error is in the ALU, you probably aren't as lucky and will have to laser off an entire core (worse errors may even cut out an entire group of cores). A huge 28-core chip has a very high probability of an error meaning that you probably have a bunch of 26-27 core chips, but far fewer 28-core chips.

What is the chances of finding that magic chip with 28 defect-free cores where all of them also run at high-speed? This is one reason why Intel has so many SKUs. AMD also sells the same chip to both consumer and server, so they can take the best of the best and ship them to servers. If there's a defective core or two, laser them out and sell one of those 12-core desktop parts. If it's missing a core and doesn't clock high enough, sell it as a 6-core 3-series chip.

The ability to split fab nodes is also huge. Half of the new Zen 2 chip is 14nm. If they had to make that on 7nm as well, their supply would be halved and prices would go up substantially due to increased chip cost and decreased chip supply. If an interposer is needed, it can be even older (I think AMD used a 65nm for their Fury HBM interposer though something even older like 180nm would probably work just fine for wire-only interposers).

wtallis · on Aug 26, 2019

> A chip can't bin faster than its slowest core. Getting 28 cores that run at top speed is almost impossible

Of note here is that Intel tried to identify which cores on a die could run fastest so that low thread count workloads could use only the best cores on the chip, but the software side was a disaster and they have pretty much given up on the concept.

> AMD also sells the same chip to both consumer and server, so they can take the best of the best and ship them to servers.

They may also be sending the most efficient chiplets to the server parts that have so many chiplets on each package, while keeping some of the fastest chiplets for the consumer parts. There's really not much precedent in recent generations for having one piece of silicon span such a wide range of products.

wmf · on Aug 26, 2019

A chip can't bin faster than its slowest core.

Intel and AMD recently started binning individual cores, although I don't think it has hit Xeon yet.

neilmovva · on Aug 26, 2019

The author's comments on cache sizes are a bit reductive. Not all "L3" is created equal, and designers always make tradeoffs between capacity and latency.

In particular, the EPYC processors achieve such high cache capacities by splitting L3 into slices across multiple silicon dies, and accessing non-local L3 incurs huge interconnect latency - 132ns on latest EPYC vs 37ns on current Xeon [1]. Even DDR4 on Intel (90ns) is faster than much of an EPYC chip's L3 cache.

Intel's monolithic die strategy keeps worst case latency low, but increases costs significantly and totally precludes caches in the hundreds of MB. Depending on workload, that may or may not be the right choice.

[1] https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7

gameswithgo · on Aug 26, 2019

In practice the large AMD L3s result in very good performance. The new Ryzen cpus for instance absolutely crush intel cpus at GCC compile times because of them ( https://www.youtube.com/watch?v=CVAt4fz--bQ )

Are there workloads where the AMD suffers due to its l3 design? Maybe, but I've not seen one yet. I would imagine something special like that you could try to arrange thread affinity to avoid non local l3 accesses.

On my 3900x L3 latency is 10.4ns when local.

dragontamer · on Aug 26, 2019

> Are there workloads where the AMD suffers due to its l3 design?

Databases, particularly any database which benefits from more than 16MB of L3 cache.

> On my 3900x L3 latency is 10.4ns when local.

And L3 latency is >100ns when off-die. Remember, to keep memory cohesive, only one L3 cache can "own" data. You gotta wait for the "other core" to give up the data before you can load it into YOUR L3 cache and start writing to it.

Its clear that AMD has a very good cache-coherence system to mitigate the problem (aka: Infinity Fabric), but you can't get around the fundamental fact that a core only really has 16MB of L3 cache.

Intel systems can have all of its L3 cache work on all of its cores, which greatly benefits database applications.

---------

AMD Zen (and Zen2) is designed for cloud-servers, where those "independent" bits of L3 cache are not really a big problem. Intel Xeon are designed for big servers which need to scale up.

With that being said, cloud-server VMs are the dominant architecture today, so AMD really did innovate here. But it doesn't change the fact that their systems have the "split L3" problem which affects databases and some other applications.

gameswithgo · on Aug 26, 2019

> Databases, particularly any database which benefits from more than 16MB of L3 cache.

Yes but have you seen this actually measured, as being a net performance problem for AMD as compared to Intel, yet? I understand the theoretical concern.

dragontamer · on Aug 26, 2019

https://www.phoronix.com/scan.php?page=article&item=amd-epyc...

Older (Zen 1), but you can see how even a AMD EPYC 7601 (32-core) is far slower than Intel Xeon Gold 6138 (20-core) in Postgres.

Apparently Java-benchmarks are also L3 cache heavy or something, because the Xeon Gold is faster in Java as well (at least, whatever Java benchmark Phoronix was running)

arantius · on Aug 27, 2019

What I see there is that the EPYC 7601 (first graph, second from the bottom) is much faster than the Xeon 6138 -- it's only slower than /two/ Xeons ("the much more expensive dual Xeon Gold 6138 configuration"). The 32-core EPYC scores 30% more than the 20-core Xeon.

dragontamer · on Aug 27, 2019

There's a lot of different benchmarks there.

Look at PostgreSQL, where the split-L3 cache hampers the EPYC 7601's design.

As I stated earlier: in many workloads, the split-cache of EPYC seems to be a benfit. But in DATABASES, which is one major workload for any modern business, EPYC loses to a much weaker system.

gameswithgo · on Aug 26, 2019

Thanks, perfect! I'll keep an eye on these to see how the new epycs do.

monocasa · on Aug 26, 2019

Are their L3 slices MOESI like their L2's are (or at least were). That'd let you have multiple copies in different slices as long as you weren't mutating them.

dragontamer · on Aug 26, 2019

AMD is using MDOEFSI, according to page 15 of: https://www.hotchips.org/wp-content/uploads/hc_archives/hc29...

However, I can't find any information on what MDOEFSI is. I'm assuming:

* Modified * Dirty * Owned * Exclusive * Forwarding * Shared * Invalid

Any information I look up comes up to an NDA-firewall pretty quickly (be it in performance counters, or hardware level documentation). It seems like AMD is highly protective of their coherency algorithm.

> That'd let you have multiple copies in different slices as long as you weren't mutating them.

Seems like the D(irty) state allows multiple copies to be mutated actually. But its still a "multiple copies" methodology. As any particular core comes up to the 8MB (Zen) or 16 MB (Zen2) limit, that's all they get. No way to have a singular dataset with 32MB of cache on Zen or Zen2.

pjc50 · on Aug 26, 2019

Is that really correct? That's huge latency for something that's in the same package. You can buy discrete SRAM with 70ns latency.

mort96 · on Aug 26, 2019

OP said only non-local L3 is 132ns. Local L3 (i.e L3 close to the core) is way faster, and the core would usually use local L3 cache.

pjc50 · on Aug 26, 2019

Oh I see - a tiny NUMA system within the package.

DiabloD3 · on Aug 26, 2019

Kind of.

In general, all Zen generations share two characteristics: cores are bound into 4 core clusters called CCXes, and two of those are bound into a group called a CCD. Chips (Zen 1 and 1+) and chiplets (Zen 2) both have only ever put one CCD per chip(-let), and 1, 2, and 4 chip(-lets) have been put on per socket.

In Zen 1 and 1+, each chip had a micro IO die, which contains the L3, making a quasi-NUMA system. Example: a dual processor Epyc of that generation would have one of 8 memory controllers reply to a fetch/write request (whoever had it closest, either somebody had it in L3 already, or somebody owned that memory channel).

L3 latency on such systems should be quoted as an average or as a best case/worst case. Stating L3 as worst case only ignores memory cache optimizations (such as prefetchers grabbing from non-local L3 and fetches from L3 do not compete with the finite RAM bandwidth, but add to it, thus leading to a possible 2-4x increase performance if multiple L3 caches are responding to your core); in addition, Intel has similar performance issues: RAM on another socket also has a latency penalty (the nature of all NUMA systems, no matter who manufactured it).

Where Zen 1 and 1+-based systems performed badly is when the prefetcher (or a NUMA-aware program) did not get pages into L2 or local L3 cache fast enough to hide the latency (Epyc had the problem of too many IO dies communicating with each other, Ryzen had the issue of not enough (singular) IO die to keep the system performing smoothly).

Zen 2 (the generation I personally adopted, wonderful architecture) switched to a chiplet design: it still retains dual 4 core CCXs per CCD (and thus, per chiplet), but the IO die now lives in its own chiplet, thus one monolithic L3 per socket. The IO die is scaled to the needs of the system, instead of statically grown with additional CCDs. Ryzen now performs ridiculously fast: meets or beats Coffee Lake Refresh performance (single and multi-threaded) for the same price, while using less watts and outputting less heat at the same time; Epyc now scales up to ridiculously huge sizes without losing performance in non-optimal cases or getting into weird NUMA latency games (everyone's early tests with Epyc 2 four socket systems on intentionally bad-for-NUMA workloads illustrate a very favorable worst case, meeting or beating Intel's current gargantuan Xeons in workloads sensitive to memory latency).

So, your statement of "a tiny NUMA system within the package" is correct for older Zens, not correct (and, thankfully, vastly improved) for Zen 2.

smueller1234 · on Aug 26, 2019

Which EPYC 2 four socket systems? I don't think those exist.

DiabloD3 · on Aug 26, 2019

Sorry I misspoke, dual socket Epycs compared to four socket Xeons; Intel may following AMD and abandoning >2 socket, as well.

twotwotwo · on Aug 26, 2019

Yeah. I bet part of why there's so much L3 per core group is that it's really expensive to go further away.

Seems like there're at least two approaches for future gens: widen the scope across which you can share L3 without a slow trip across the I/O die, or speed up the hop through the I/O die. Unsure what's actually a cost-effective change vs. just a pipe dream, though.

letstrynvm · on Aug 26, 2019

It's maybe the latency to bring the whole cache line over.

markhahn · on Aug 26, 2019

OP appears to be talking about change of ownership of a line, not merely bringing it across.

microcolonel · on Aug 26, 2019

When you access L3, you're not just accessing some memory.

bArray · on Aug 26, 2019

I'm very confused, there appear to be several conflicting reports on L3 cache latency for EPYC chips [1] [2]. Is it the the larger random cache writes that are causing the additional latency?

Regardless I wouldn't be particularly concerned, cache seems like the easier issue to address vs power density.

[1] https://www.tomshardware.com/reviews/amd-ryzen-5-1600x-cpu-r...

[2] https://www.tomshardware.com/reviews/amd-ryzen-7-1800x-cpu,4...

dragontamer · on Aug 26, 2019

> Is it the the larger random cache writes that are causing the additional latency?

Think of the MESI model.

If Core#0 controls memory location #500 (Exclusive state), and then Core#32 wants to write to memory location #500 (also requires Exclusive state), how do you coordinate this?

The steps are as follows:

#1: Core#0 flushes the write buffer, L1 cache, and L2 cache so that the L3 cache & memory location #500 is fully updated.

#2: Memory location #500 is pushed out from Core#0 L3 cache and pushed into Core#32 L3 cache. (Core#0 sets Location#500 to "Invalid", which allows Core#32 to set Location#500 to Exclusive).

#3: Core#32 L3 cache then transfers the data to L2, L1, and finally is able to be read by core#32.

--------

EDIT: Step #1 is missing when you read from DDR4 RAM. So DDR4 RAM reads under the Zen and Zen2 architecture are faster than remote L3 reads. An interesting quirk for sure.

In practice, Zen / Zen2's quirk doesn't seem to be a big deal for a large number of workloads (especially cloud servers / VMs). Databases are the only major workload I'm aware of where this really becomes a huge issue.

FullyFunctional · on Aug 26, 2019

Very interesting info on the overclocking difference between TSMC 7nm and Intel 14nm+++ however a few misconceptions:

- Intel staying low core count probably wasn't evil intent: the software wasn't there and Intel had better single thread perf than AMD. AMD was basically forced into more cores earlier because of weakness in single thread. Today, the software _is_ there (well, mostly) and we can all take advantage of more cores.

- Why did Intel fall behind? Easy: Brian Krzanich's hubris pushed the process too hard, taking many risks, and the strategy failed spectacularly.

- PCIe Gen4 does matter. M.2 NVMe has been read limited for a long time already (NAND bandwidth scales trivially). The I/O section of this article is basically nonsense.

- There's is nothing magical about x86, nor about the AMD and Intel design team. If the market is there, there will be competitive non-x86 alternatives. The data center market is pretty conservative for good reason - but ML is upending a lot of conventional wisdom so it'll be interesting to see what happens.

ac29 · on Aug 26, 2019

> PCIe Gen4 does matter. M.2 NVMe has been read limited for a long time already (NAND bandwidth scales trivially). The I/O section of this article is basically nonsense.

I think the author's point was that storage is already plenty fast enough for many tasks. Personally, I cant feel the difference in storage performance between my NVMe system and older SATA SSD ones, despite NVMe being much faster.

Ygg2 · on Aug 26, 2019

> I cant feel the difference in storage performance between my NVMe

That's because for the most common tasks, real performance had not improved.

Manufacturers always display the write/read speed using many cores and with huge queue depth. The typical use case for that is copying files. It's around 3000MB/s

When starting OS and loading program you have tasks that use low queue and thread count, which is about 60-70 MB/s.

tempguy9999 · on Aug 26, 2019

Genuine questions. I just don't know this area...

> Manufacturers always display the write/read speed using many cores...

Why does #cores matter? Once the IO is initiated it's handed off to some DMA/coprocessor stuff so the core can get back to starting other IOs without contention. Being clueless I can't see why a single core couldn't saturate IO bandwidth by just blasting out IO requests.

> ...and with huge queue depth.

Really stupid question now, I thought disk queues were queued up requests caused by issuing a buttload of IO requests, so why does a disk queue of say 20 actually make things any faster over a disk queue of just 2?

I know (correction, believe) with spinny disks a longer queue could be used by the controller to optimise access on the disk surface by proximity, but with SSDs that performance characteristic doesn't exist (access is uniform anywhere, I think) so that optimisation doesn't apply.

wtallis · on Aug 26, 2019

> Why does #cores matter?

If you're benchmarking 4kB IOs, then the system call and interrupt handling overhead means you can't keep a high-end NVMe SSD 100% busy with only a single CPU core issuing requests one at a time. The time it takes to move 4kB across a PCIe link is absolutely trivial compared to post-Spectre/Meltdown context switch times. A program performing random IO to a mmapped file will never stress the SSD as much as a program that submits batches of several IO requests using a low-overhead asynchronous IO API.

> so why does a disk queue of say 20 actually make things any faster over a disk queue of just 2?

Because unlike almost all hard drives, SSDs can actually work on more than one outstanding request at a time. Consumer SSDs use controllers with four or eight independent NAND flash memory channels. Enterprise SSD controllers are 8-32 channels. And there's some amount of parallelism available between NAND flash dies attached to the same channel, and between planes on an individual die. Also, for writes, SSDs will commonly buffer commands to be combined and issued with fewer actual NAND program operations.

tempguy9999 · on Aug 26, 2019

Thanks, very informative!

One more thing, I'm surprised that 4KB blocks are relevant, I'd have thought that disk requests in benchmarking (edit: cos manufacturers like to cheat), and a lot of reads in the real world, would operate at much larger requests than 4K.

Is it that IOs are broken down to 4K blocks at the disc controller level, or is that done deliberately in benchmarking to stress the IO subsystem?

wtallis · on Aug 26, 2019

SSDs are traditionally marketed with sequential IO performance expressed in MB/s or GB/s, and random IO performance expressed in 4kB IOs per second (IOPS). Using larger block sizes for random IO will increase throughput in terms of GB/s but will almost always yield a lower IOPS number. Block sizes smaller than 4kB usually don't give any improvement to IOPS because the SSD's Flash Translation Layer is usually built around managing data with 4kB granularity.

tempguy9999 · on Aug 26, 2019

> expressed in 4kB IOs per second (IOPS). Using larger block sizes for random IO will increase throughput in terms of GB/s but will almost always yield a lower IOPS number

and higher IOPs numbers give the marketing department the warm and fuzzies. Got it.

mlyle · on Aug 26, 2019

It is a good metric because it is a measure of random access. Bigger requests are a mixed measure of sequential performance and random access (and you can basically infer the performance for any io size from the huge request bandwidth and the smallest reasonable IO IOPS)

josteink · on Aug 26, 2019

> Personally, I cant feel the difference in storage performance between my NVMe system and older SATA SSD ones, despite NVMe being much faster.

For the tasks I do, this is night and day, and lack of NVMe is reason enough to warrant buying new hardware.

abarringer · on Aug 26, 2019

We are crafting a new Server cluster and it looks like the Gen4 PCIe will give us a 40% performance boost for what is basically NVMe over Fabric using Mellanox 100Gb cards (Azure HCI aka S2D). Next year we will be building out another cluster and will hopefully use Mellanox 200Gb cards which is only possible due to Gen4.

Performance tests here https://www.servethehome.com/amd-epyc-7002-series-rome-deliv...

FullyFunctional · on Aug 26, 2019

Ah, but that’s making assumptions about the user. There are lots of power users for whom IO bandwidth matters; high-end video editing for example.

tjoff · on Aug 26, 2019

And low-cores are not?! We have been core starved for almost a decade!

IO matters but peak bandwidth sequential reads are not such a limiting factor even for power users.

nottorp · on Aug 26, 2019

They - and any other subset - can get the PCIe 4 SSDs of course.

There are "lots" of other subsets where it stopped mattering a while ago though.

gameswithgo · on Aug 26, 2019

> PCIe Gen4 does matter. M.2 NVMe has been read limited for a long time already (NAND bandwidth scales trivially). The I/O section of this article is basically nonsense.

It probably does for a great many special use cases on servers or other specialty workloads. But for programming/gaming/normal productivity workloads it really doesn't. The net difference in load times/compile times/boot times is tiny even going from a cheap ssd to the latest pci4 ssds.

If you do huge sequential reads/writes, then yeah, large differences are getting unlocked.

lumost · on Aug 26, 2019

Has anyone tried adding branch prediction to disk operations? While it would confuse a network, or other peripheral a preemptive random read from a disk ought to have minor impact.

zrm · on Aug 26, 2019

Operating systems have supported readahead for a long time. It's even more benefit for spinning rust than SSDs because the benefit of a correct prediction is larger when the drive is slower. But it's also a trade off, because if you predict wrong you're wasting scarce I/O resources fetching useless data.

fmajid · on Aug 26, 2019

I wonder how much of Intel’s stagnation is due to their bringing CPU engineering back to the US from Israel. In other words putting the geniuses behind Netburst and Itanic in charge.

totololo · on Aug 26, 2019

Can you elaborate? Do you know why the repatriation was decided? Can you explain why Netburst and (I’m guessing this is a pun on) Itanium were failures? Do you know what Israeli teams did better?

cptskippy · on Aug 26, 2019

Netburst was the Pentium 4 architecture that introduce large execution pipelines that permitted high clock speeds thinking they could out run inefficiency. Latency, power and temperature however became mitigating factors.

Alongside Netburst development, Intel had the Israeli team develop a mobile first platform. The team took parts of the bus for Netburst and shit canned the rest if the design in favor of the Pentium III which they optimized the shit out of. They introduced concepts like microop fusion and developed the processor with the goal of sleeping whenever possible to conserve power.

The net result was the Banias platform which was incredibly efficient because it was incredibly fast. The original Core line was basically Banias with power saving turned off.

ethbro · on Aug 27, 2019

From memory, the Netburst team ignored power efficiency until power density concerns hard-capped their frequency targets (and this was on 180/130/90 nm!).

Itanium is often lazily pointed to as a failure by Intel, but in reality it was a hard task with decent execution.

Intel attempted to build an ecosystem around an entirely new architecture. Emphasis on smart-compiler extracted instruction level parallelism, supported by wide but less superscalar hardware. In 2001.

As an analogy, this is kind of like asking every driver to toggle a switch to manage the engine mapping while they drive. It could be more effective, but it's also a ton of work and relies on expertise being developed.

AMD delivered what the market really wanted in x64: something that looked exactly like x86, with a few modest improvements, and some 32-bit scaling limitations removed. But no big changes.

The Israeli teams are usually cited as the caretakers of the P6 (Pentium Pro) legacy, which evolved into the Pentium M, and eventually the Core architectures.

They chose to optimize for power efficiency, and so didn't hit the same power caps the Netburst team did.

Generally, the internet comments on this are armchair generaling with the benefit of hindsight. There are many reasons things could have gone differently.

Sometimes two teams work equally hard, but one chooses a path that leads to success and the other one that leads to failure.

emmp · on Aug 26, 2019

Well they've now put the genius behind Zen in charge (Jim Keller), have they not?

zsrxx · on Aug 26, 2019

>Today, the software _is_ there (well, mostly) and we can all take advantage of more cores.

Disagree. Not even "mostly". If I had a dime every time I saw some software stuck at 100% of one core... well, I'd have a few dimes a day.

hajile · on Aug 26, 2019

Most consumers use their devices for media consumption, games, and surfing the web. ALL of these are heavily multi-threaded. Yes, even dear old Javascript is parsed and compiled in multiple threads, concurrently GC'd and uses web workers, service workers, and webassembly.

In the business world, all the commonly-used software I can think of is heavily threaded as well. Generally, the remaining software that comes to mind is simple stuff that could run on a decade-old machine without too much trouble.

What typical programs are you running that require so much single-threaded performance?

mtone · on Aug 26, 2019

Using Acrobat DC, searching for content in all PDFs in a folder takes ages. It is both slow AND single-threaded, the disk is barely moving.

Tried with Foxit and a few others, same thing. I had to seek indexed search apps, and those are few and far between and generally poor quality - kudos to Recoll[1] as the standout so far. But I'd much rather just search within the reader.

Anyway, it's infuriating, this should go about x times as fast as there are cores.

[1]: https://www.lesbonscomptes.com/recoll/

zsrxx · on Aug 26, 2019

Windows Update, Windows Defender Antivirus, NTFS compression, maybe more I've forgotten.

Also old (and not-so-old) games.

caspper69 · on Aug 26, 2019

Well, that's not at all a fair comparison. Who knows what that one core is doing? Not all algorithms are parallelizable, in fact most aren't. So doing standard, everyday things in code might still max out a core.

zsrxx · on Aug 26, 2019

Exactly. So single-core performance wins.

walrus01 · on Aug 26, 2019

> - PCIe Gen4 does matter

Strong disagree there, for threadripper and epyc products based on zen2. There's use cases for two and four port 100GbE PCI-Express NICs in 1RU and 2RU sized x86-64 platform systems where you definitely want as much pci-express bus bandwidth as possible.

In addition to the obvious use cases like people putting multiple nvme pci-express bus ssds in pci-express slots. At the low end of the price market things like a pci-express 3.0 x8 card that holds four m.2 2280 SSDs in a $18 part.

First generation single socket epyc was already a huge improvement in number of pci-express lanes per CPU socket compared to equivalently priced intel product, if you were, for example looking at building a 2RU system with eight or twelve 100GbE interfaces to run juniper JunOS as VMX.

In the single socket higher-end workstation CPU market ($400 to $800 CPUs) Intel has for a very long time had a constrained number of pci-express lanes available per motherboard. In a setup with a single high end video card like a 1080ti or 2080 and a single nvme SSD, it doesn't leave a whole lot of I/O lanes left over.

FullyFunctional · on Aug 26, 2019

I cannot follow your argument, so let me give examples: https://www.anandtech.com/show/14729/samsung-preps-pm1733-pc...

Samsung introduced a 8 GB/s SSD specifically for PCIe Gen4. I used to work in the SSD business many years ago and we could _trivially_ saturate PCIe Gen3 x4 but it wasn't economical to go to more lanes. (Bandwidth is easy, latency is hard).

walrus01 · on Aug 26, 2019

Somehow I mistakenly read the parent comment as pcie 4.0 doesn't matter, and typed a whole bunch of paragraphs in answer to that. Point was that 4.0 and more lanes are a much needed improvement so I actually am in agreement with the parent comment. Oops.

eemil · on Aug 26, 2019

Maybe I'm missing something, but PCIe Gen4 does seem like a big deal since it's effectively doubling the data rate of the previous gen. Meaning, once all/most of your peripherals are PCIe 4 compatible, you'll theoretically have double the PCIe lanes, on account of only needing half as many.

One immediate benefit that comes to mind, is storage controllers. With 8x spinning disks, you might get away with using a single lane which can provide almost 2 GByte/s bandwidth. And controllers previously using 8xGen3 lanes could switch to 4xGen4 without any loss in performance.

Animats · on Aug 26, 2019

Intel is the last US company with a bleeding edge fab. All other fabs below 15nm are outside the US, except for one Samsung fab in Texas. When Intel falls behind, that's the end of the US as a leader in the semiconductor industry.

nosianu · on Aug 26, 2019

I am in no position to evaluate this article, but I found it interesting that it exists more so than the specifics, especially since it was a conservative source: https://www.theamericanconservative.com/articles/americas-mo...

They look at the military angle specifically ("Wall Street's short-term incentives have decimated our defense industrial base and undermined our national security.") but it's wider than that.

Example quotes:

> ...in the last 20 years, every single American producer of key telecommunication equipment sectors is gone. Today, only two European makers—Ericsson and Nokia—are left to compete with Huawei and another Chinese competitor, ZTE.

> ...public policies focused on finance instead of production, the United States increasingly cannot produce or maintain vital systems upon which our economy, our military, and our allies rely.

As for chip production capacity, "N. America" (is there anything in Canada or is this just "U.S."?), it's just 12.8% of worldwide capacity and 3/4 of capacity is in Asia: https://anysilicon.com/semiconductor-wafer-capacity-per-regi...

The idea of globalization was that where production is located does not matter, market "magic" somehow makes it irrelevant. I don't see how it does not matter when the imbalance becomes as extreme as it is nowadays. "Finance" is not an industry (if anyone wants to argue with me about the definition of that word I refer to https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-... -- you know what I mean).

ryacko · on Aug 26, 2019

Is it necessary for manufacturing to be in the United States? Certainly secure shipping lanes and a country with a friendly government is enough.

colechristensen · on Aug 26, 2019

When the shit hits the fan, suddenly you are dependent on that friendly government for something and those necessary manufacturing facilities become a target and that friendly government has leverage over you and maybe becomes not quite so friendly.

It has been so long since something bad has happened that people believe it can't happen. Pax Americana can only last so long and with the way things are going right now it seems closer to its end than its beginning.

forty · on Aug 26, 2019

If the end users are in the US, I assume there are financial and (more importantly) environmental costs associated with shipping when everything is made in Asia.

We need to re-learn to consume local products if we want to save this planet.

pjc50 · on Aug 26, 2019

Shipping costs are fairly trivial to the environment compared to manufacturing for semiconductors, because they're extremely value-dense. And the manufacturing process is intensive of energy, water and nasty solvents. Some of the early US semiconductor sites are now Superfund sites.

mcny · on Aug 26, 2019

I didn't know what Superfund was so I googled it:

> Superfund sites are polluted locations in the United States requiring a long-term response to clean up hazardous material contaminations. They were designated under the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA) of 1980. CERCLA authorized the United States Environmental Protection Agency (EPA) to create a list of such locations, which are placed on the National Priorities List (NPL).

https://en.wikipedia.org/wiki/List_of_Superfund_sites

> Superfund is a United States federal government program designed to fund the cleanup of sites contaminated with hazardous substances and pollutants. Sites managed under this program are referred to as "Superfund" sites. It was established as the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA).[1] It authorizes federal natural resource agencies, primarily the Environmental Protection Agency (EPA), states and Native American tribes to recover natural resource damages caused by hazardous substances, though most states have and most often use their own versions of CERCLA. CERCLA created the Agency for Toxic Substances and Disease Registry (ATSDR). The EPA may identify parties responsible for hazardous substances releases to the environment (polluters) and either compel them to clean up the sites, or it may undertake the cleanup on its own using the Superfund (a trust fund) and costs recovered from polluters by referring to the U.S. Department of Justice.

https://en.wikipedia.org/wiki/Superfund

pjc50 · on Aug 26, 2019

Sorry, I shouldn't have used the Americanism. I had in mind specifically Fairchild (trichloroethane, arsenic contamination of groundwater) https://cumulis.epa.gov/supercpad/SiteProfiles/index.cfm?fus... , but there are plenty of others.

See also https://www.nytimes.com/2018/03/26/lens/the-superfund-sites-...

pjc50 · on Aug 26, 2019

Given the tariffs, does the US government count as "friendly to the US" any more?

abtinf · on Aug 26, 2019

Finance is the fundamental industry upon which all others are built.

Arguments against free trade or trade “imbalance” are always one or more of the following: economically ignorant; mercantilist; racist; or nationalist. They all deny individual rights in favor of tribalism.

Teckla · on Aug 26, 2019

Intel is the last US company with a bleeding edge fab.

Should Intel really be considered a U.S. company these days?

I tend to consider companies like Intel (and IBM, and Apple, and Microsoft, etc.) international companies.

leoc · on Aug 26, 2019

This is why I think it's likely that Uncle Sam will swoop in, in one way or another, to try to save Intel's position as a leading-edge fabricator if it comes to that. I don't know how successful such a rescue would or will be.

tiffanyh · on Aug 26, 2019

It should be noted that the author is Matt Dillon of Dragonflybsd fame.

I'll repost a previous post I made

https://news.ycombinator.com/item?id=15484735 ------

To also give context as to what Dragonfly BSD is, DragonFly BSD was forked from FreeBSD 4.8 in June of 2003, by Matthew Dillon over a differing of opinion on how to handle SMP support in FreeBSD. Dragonfly is generally consider as having a much simpler (and cleaning) implementation of SMP which has allowed the core team to more easily maintain SMP support; yet without sacrificing performance (numerous benchmarks demonstrate that Dragonfly is even more performant than FreeBSD [5]).

The core team of Dragonfly developers is small but extremely talented (e.g. they have frequently found hardware bugs in Intel/AMD that no one else has found in the Linux/BSD community [6]). They strive for correctness of code, ease of maintainability (e.g. only support x86 architecture, design decisions, etc.) and performance as project goals.

If you haven't already looked at Dragonfly, I highly recommend you to do so.

[5] https://www.dragonflybsd.org/performance/

[6] http://www.zdnet.com/article/amd-owns-up-to-cpu-bug/

_skel · on Aug 26, 2019

It's weird for Intel to be falling behind against AMD. Intel has about 10x the revenue and 10x the number of employees. Yet AMD is able to compete directly, and even beat Intel in some areas.

Good for AMD, but I'm more interested in an explanation of how Intel allowed this to happen.

thunderbird120 · on Aug 26, 2019

Intel ties its architectures to its foundry nodes since it both designs the chips and produces them, unlike AMD. This means that they can get some extra performance but it also means that if there problems with the new node, like the ones they had with 10nm, everyone just has to sit on their hands until it's resolved. Going forward they've decided to decouple their architectures and nodes so that this doesn't happen again. Despite what other commenters are saying, this issue isn't really a result of them getting complacent. Intel's original plan for their 10nm node was actually extremely ambitious both in transistor density and in implementation of new techniques. Too ambitious, as it turned out.

close04 · on Aug 26, 2019

> Despite what other commenters are saying, this issue isn't really a result of them getting complacent. Intel's original plan for their 10nm node was actually extremely ambitious [...] Too ambitious, as it turned out.

Couldn't it be both?

sounds · on Aug 26, 2019

Basically Intel does fine when the lowest 2 levels of management are able to keep the company functional.

The decision to bet the company on 10nm came from the top (Krzanich).

If the entire company is dysfunctional, no amount of low-level work can undo that: if the captain runs the ship aground, adding more sailors bailing water doesn't get the ship to its port.

From the outside, Intel looks complacent. To a degree, that's true, but the inside story is more panic and confusion. The third-party vendors who provide the EUV tools are in panic mode. The leadership is in confusion mode (no amount of seppuku can make the old ways work; re-organizing those 2 levels of low-level managers can't fix this; top-down cannot turn back time and assume that 10nm will be this bad).

Consumers really only see the output of the product/sales division, such as which features to disable on which SKUs. Intel is facing an actual engineering failure; there are no bloody features to disable. Taking the few 10nm parts that made it out the door and sticking them in the 14nm "10th generation" is a product/sales decision. Ignoring the people telling you 10nm is a flop - that's a CEO level decision.

dboreham · on Aug 26, 2019

I'm surprised this was a surprise. When I entered the semiconductor industry 30+ years ago the process folks were failing to deliver their fancy new 1u process.

rincebrain · on Aug 26, 2019

Intel has always bet hard on their homegrown fab process and efficiencies winning over competitors.

With 10nm, it's years overdue (we're now on 14nm++++, if my count is right), and Intel's just been incrementally iterating on Skylake in the meanwhile.

I'm really looking forward to the next generation or two, when we see non-mobile 10nm+ chips from Intel and whether they can make up the lost ground between Spectre variants and AMD's gains.

deaddodo · on Aug 26, 2019

Not even Skylake. Sandy Bridge was the last major improvement to their microarchitectures, and that was in 2011.

People blame this on complacency (and there was definitely some of that, as internal reports have revealed) but it also has to do with their inability to execute on their 10nm process and the fact that they've got everything tied to in-house development; while AMD is able to execute a much more mercenary approach (mixing and matching node processes, sourcing to multiple fabs, utilizing open standards vs baking their own, etc).

quotemstr · on Aug 26, 2019

The writing was on the wall years ago. Intel became one of those "big company" places without much of a competitive spirit, and they started focusing more on pet projects (technical and not), internal politics, and all the other decay products of bigcorpism instead of continuing to advance chips. AMD, meanwhile, had existential pressure to get better at their core business, and it eventually worked.

It's a pattern you see over and over again.

ksec · on Aug 26, 2019

Somebody made a similar comment [1] not long ago. And I copy my reply below.

I don't particularly like Intel, but that is Hardly a fair comparison.

TSMC 48K employees, and $13 - $15B R&D.

And even that is excluding all the ecosystem and tooling companies around TSMC. Compared to Intel which does it all by themselves. Not to mention Intel does way more than just CPU, GPU, also Memory, Network, Mobile, 5G, WiFi, FPGA, Storage Controller etc.

[1] https://news.ycombinator.com/item?id=20646790

klingonopera · on Aug 26, 2019

I'm not sure such a comparison can hold up, e.g. those employee numbers probably include the people working in Intel's foundries, whereas AMD has outsourced that to TSMC.

NeedMoreTea · on Aug 26, 2019

Intel has apparently been milking their comfortable first place for years. So it's not weird, they've been coasting for years as they thought the race was won.

I was a little surprised, but not shocked, that Intel's best reaction to the new AMD processors was so underwhelming.

Yet it's what we expect in most markets - and what happened after IE6 (or Chrome) won. Innovation stops, differentiation by anti-consumer means ramps up.

lumberingjack · on Aug 26, 2019

They put a lot of their beans into stuff that died like Atom, "12000 Worldwide, 11 Percent" of the Intel workforce was let go becasue of the Atom chip bugs and failures.

abtinf · on Aug 26, 2019

> Intel has about...10x the number of employees.

In all seriousness, it is a miracle Intel is able to compete at all.

Pros of higher headcount: theoretically lower transaction costs for highly specialized people to work together.

Cons of higher headcount: more command and control; fewer competing ideas; no market mechanism to evaluate the beat ideas; massively increased politics and executive jockeying; way more middle managers and fewer entrepreneurs; and continuing to invest in failed ventures. I’m sure there are many more cons.

Shorel · on Aug 26, 2019

Intel still has the marketshare and last time I checked, AMD instances in AWS have less than half the performance than similarly priced Intel instances.

I am an AMD fan and my laptop is Ryzen, but I would not say AMD has already won. It is catching up fast, but they can't stop innovating.

virtualwhys · on Aug 26, 2019

And for laptops everyone is stuck with Intel 14++ until AMD delivers 7nm for mobile (Intel 10nm offerings look very underwhelming).

What a great time to be in the market for a new machine, lots of hot air (literally) and throttling with the 6/8 core Intel i9s.

zaarn · on Aug 26, 2019

10nm has to compete with 3 generations worth of improvement on 14nm. Intel dug their own grave there, they gambled on 10nm, they lost, then they gambled on 10nm being good enough to beat improved 14nm processes, and lost again.

virtualwhys · on Aug 26, 2019

If they don't jump straight to 7nm then next iteration of 10nm may be more impressive.

At 6-8 cores a 10nm with low base clock and decent boost speeds would probably be a decent option provided it doesn't throttle in the thin/light laptops of today.

Consumer is caught between worlds, the old power hungry one, and the new efficient one where long battery life and cool/quiet mobile systems are the norm (at least that's the hope).

zaarn · on Aug 26, 2019

From what we've seen in first preliminary benchmarks, the 10nm isn't impressive in either Performance or Power Consumption and AMD trumps it in both. The consumer is only trapped if they choose Intel.

schanivo · on Aug 26, 2019

Well Lenovo is already selling the AMD based laptop for business users and they are quite a thing

virtualwhys · on Aug 26, 2019

Sure, but not yet on 7nm. Maybe sometime during 2020 that will happen, but there's no public roadmap that I know of that states AMD 7nm mobile processors will be available in X quarter of Y year.

Shorel · on Aug 26, 2019

I have a Ryzen 5 laptop but I am very interested in these new 7nm CPUs for the next one.

This one was an anomaly and it doesn't feel fully supported by the Linux kernel, I have to add several parameters to the bootloader just to install Linux.

Next iterations should be better.

holtalanm · on Aug 26, 2019

> And for laptops everyone is stuck with Intel 14++ until AMD delivers 7nm for mobile

idk, I got a Ryzen 5 HP Envy 15" last year, and am perfectly happy with it. Intel isn't 100% owning the laptop market right now.

marmaduke · on Aug 26, 2019

Dell ships à 14" Ryzen office laptop for the last year at least. I'm not saying it's the sexiest thing out there, but it does exist.

pkulak · on Aug 26, 2019

I think OP is talking nm, not inches. ;)

marmaduke · on Aug 26, 2019

My point was that we aren't stuck with with Intel for laptops, because Ryzen laptops are being shipped for a while now.

jotm · on Aug 26, 2019

14 inches = 355 600 000 nanometers, heh

tempguy9999 · on Aug 26, 2019

> (read: Intel trying real hard to keep people on 4 cores so they could charge an arm and a leg for more)

There's quote, something along the lines of "You have to cannibalise your own business. If you don't someone else will". A lesson intel chose to forget because it made them more money, right up until it didn't.

lunchables · on Aug 26, 2019

I always felt Apple was good at this, in the Steve Jobs era, anyway. Everyone said the iPhone would kill the iPod, but that didn't matter in reality because the iPhone opportunity was so much larger.

reacweb · on Aug 26, 2019

Interesting quote. I shall forward it to my boss (we are a software publisher).

dagw · on Aug 26, 2019

That is basically Clayton Christensen whole shtick (may even be his quote). He published a rather influential book 20 years ago (The Innovator's Dilemma) that popularized that idea. It's an interesting read, even if not everything has aged equally well.

feb · on Aug 26, 2019

That book is on my shelf to be read. In what way did age poorly ?

dagw · on Aug 26, 2019

The book as a whole has aged pretty well (with the caveat that I haven't read it in over a decade). My recollection of the book is that it tended to treat disrupting an entrenched player as a panacea and being disrupted as a death knell. It was somewhat un-nuanced in its disruptor/disrutptee split, and doesn't really look much at why some of the disrupted companies remained perfectly fine and why some of disruptors floundered.

Still a great book and it was highly influential on how I think about business. He's also written a bunch of books after that book that may or may not cover some of those points.

Ygg2 · on Aug 26, 2019

I think number one is definitely wrong. Recent inquiry led reviewers to discover not all boards are created equal. While differences aren't drastic on some boards you can't reach boost clock written on the box.

https://youtu.be/o2SzF3IiMaE

eemil · on Aug 26, 2019

I think this is more a problem to do with motherboard manufacturers than AMD. The manufacturers decide which boards get a Ryzen 3000 stamp of approval, and BIOS support. Though maybe AMD should have been more stringent about their motherboard requirements regarding forwards compatibility with Ryzen 3000.

Even if there were more serious issues, I would argue that AMD did the right thing keeping the AM4 socket. Since this varies from board to board, it's obviously not a deficit of the socket itself. At the end of the day, nobody is forcing you to keep your old motherboard. You just have the option to. If you can save 25% by not buying a new motherboard, it probably justifies a small performance deficit.

makomk · on Aug 26, 2019

On most motherboards you can't reach the boost clock written on the box. The handful which can seem to be more an exception than the rule, and even most of the motherboards specifically designed for the current generation of AMD CPUs can't do it. The ones which can are generally quite expensive, though not all of the expensive boards achieve this - in fact, some of them seem to be worse than their cheaper counterparts from the same manufacturer!

0-_-0 · on Aug 26, 2019

Reaching max boost clock could just mean some boards boosting for a nanosecond to 4.5 Ghz and then immediately dropping back, and others keeping a lower but stable clock. Aren't average clocks more important?

Ygg2 · on Aug 26, 2019

I think that's the point. Most cores don't ever reach advertised boost clocks by default (let alone exceed the boost clocks as was advertised in AMD's commercial explaining PBO). It seems you need to manually overclock via trial and error each core, and even then, you won't get overclocks on all cores.

Joeri · on Aug 26, 2019

Intel has been in this situation before and found their way out from it. They may pull another rabbit out of a hat like they did with the core architecture, or they could always just buy AMD.

zaarn · on Aug 26, 2019

Buying AMD will make regulatory offices fairly angry. The EU is just waiting for the chance to beat up Intel over their market dominance and anti-competitive shenanigans over the years.

dreich · on Aug 26, 2019

They cannot buy AMD.

wjnc · on Aug 26, 2019

Given the financial positions involved they could easily afford to (market cap order of 30 vs 200 billion USD, Intel free cash flow per quarter of 5-15 billion). Regulatory authorities probably wouldn't let them, but that is another argument to make.

FartyMcFarter · on Aug 26, 2019

It's the same argument - it's just not going to happen, regardless of monetary considerations.

Zardoz84 · on Aug 26, 2019

It would be monopoly.

floatboth · on Aug 26, 2019

> on a Zen 2 system if you increase the voltage to push frequency you also wind up increasing the temperature which retards the maximum possible stable frequency

Well, that effect is there on Zen 1 to an extent, but you can overpower it with >1.4V.

psurge · on Aug 26, 2019

Does anyone know of any EPYC 7002 servers, preferably single socket ones, that also support EDSFF E1.L SSDs?

wmf · on Aug 26, 2019

Just get Supermicro to make it for you.

blingojames · on Aug 28, 2019

Will this effect low end CPU prices?

superkuh · on Aug 26, 2019

A bit off topic but this is the best webpage design I've seen on the HN frontpage in months.

castratikron · on Aug 26, 2019

http://bettermotherfuckingwebsite.com/

I actually used this template for an internal development tool and everyone loves how fast and simple it is.

echelon · on Aug 26, 2019

Was this website written by Maddox? I haven't seen prose like this since the early 2000's. It's way over the top and never calms down.

samplatt · on Aug 26, 2019

By the looks of it, no; Drew McConville is not George Ouzounian. Guess Drew's just a fan.

drew_mc · on Aug 26, 2019

confirmed

nitrogen · on Aug 26, 2019

Regarding the line width section, I really wish people had developed the habit of resizing browser windows. Though unfortunately these days resizing the window makes fingerprinting significantly easier.

alkonaut · on Aug 26, 2019

I have dozens of webpages open in my browser window. The tabbed UI means ALL webpages have the same window size. If I were to resize my window for various sites (or even various locations on the same site or page) I'd do nothing else but resize. Just switching back and forth between tabs then often requires resizing the window! I think the idea of "make websites full width, users can resize their window" builds on a few assumptions that were perhaps true on a linux desktop in 1996 but they aren't anymore:

- Users have reasonable window managers for non-maximized windows(they don't, they run windows mostly).

- Websites display text mostly that can flow nicely.

- Web browsers have one window per web page

im3w1l · on Aug 26, 2019

I love the drag-window-to-edge-to-split-screen behavior in windows.

alkonaut · on Aug 26, 2019

Yes, I wish there was something to make it split in 3 though for example. On an ultrawide monitor you don't want your browser on either the left or the right side, you want a 3 column layout.

yoz-y · on Aug 26, 2019

Niche product: ability to resize the viewport within a tab

nitrogen · on Aug 26, 2019

Sort of like a better UI for the mobile viewport/responsive design dev tools in browsers

mrob · on Aug 26, 2019

It's worse than the original because the contrast is lower.

K2L8M11N2 · on Aug 26, 2019

https://bestmotherfucking.website

mrob · on Aug 26, 2019

Thanks, that's the real best one. I tried to find it with HN's search engine but failed because I forgot about the weird TLD.

kuschku · on Aug 26, 2019

That one actually hurts to look at on a calibrated monitor. Like, I literally get migraines from that contrast.

A lesson on why CSS should be able to specify the nits and colorspace you’d like your content to be displayed in, so it can be readable on your shitty screen, and avoid giving me headaches on my screen.

zajio1am · on Aug 26, 2019

Well, if you do not like white background, just change default background in your browser config. I have light gray and it is perfect. I do not see why websites should override default colors unless they have a good reasons for specific colors.

Also, the color calibration AFAIK does not specify brightness. Perhaps the headaches are not from contrast, but from too much brightness?

kuschku · on Aug 26, 2019

Brightness is also specified for different media — between 100 and 300 nits for SDR content, and 1000 nits for HDR.

Text becomes painful if the contrast is too high, but if you have a cheap screen at 70 nits it won't ever be an issue.

koffiezet · on Aug 26, 2019

Too bad the page's source isn't properly formatted, certainly if it's meant to make a point on how to do it properly.

amanzi · on Aug 26, 2019

Now that is really a great website design.

esyir · on Aug 26, 2019

Eh, I don't really like text that fills all the way to the edges. It's a bit difficult to read that way. The http://bettermotherfuckingwebsite.com/ posted below definitely reads better to me (though expanding line length a bit might still apply to longer articles).

greyskull · on Aug 26, 2019

https://ux.stackexchange.com/questions/108801/what-is-the-be...

I picked up a tidbit about optimal line widths in one of my courses and it's stuck with me. I think back to it whenever I see websites that don't constrain the width.

saagarjha · on Aug 26, 2019

It’s kind of awful on mobile, though.

shakna · on Aug 26, 2019

Fixing that would just be adding one line to the head though:

    <meta name="viewport" content="width=device-width, initial-scale=1">

jdnenej · on Aug 26, 2019

Works fine in reader mode but yeah it's totally unreadable on mobile.

fulafel · on Aug 26, 2019

Anyone know why mobile browsers have this default rendering?

saagarjha · on Aug 26, 2019

Probably because websites used to not be mobile-optimized and they wouldn't fit on the screen if you didn't set a large viewport.

zucker42 · on Aug 26, 2019

It looks fine for me on Android Brave.

alkonaut · on Aug 26, 2019

Most users run full screen browsers. Yes we run full screen browsers across our 1980, 2560 and 3840 screens. There is no way a web designer could or should say "well that's the fault of users". That's just a reality.

Users want a readable-width column in front of their eyes. That is: it shouldn't be full width, it has to have a max size that is smaller. And the column has to be centered because the left edge of a big screen is way too off center to read comfortably.

Agree on the minimalist design being a breath of fresh air though. Just needs a reasonable paragraph formatting and it's perfect.

kalleboo · on Aug 26, 2019

If you're insane enough to full-screen the web on a large wide-screen monitor, can't you just hit the reader mode button in your browser?

alkonaut · on Aug 26, 2019

That's a great idea. I always forget that exists (It doesn't help all pages though but in examples like this it's perfect).

michaelanckaert · on Aug 26, 2019

The nears a design what I call Spartan Webdesign [0]. The UX on mobile devices is not there yet but I agree this is one of the better sites we've seen here in a long time :-)

[0] https://www.sinax.be/blog/general/guidelines-for-spartan-web...

goto11 · on Aug 26, 2019

The line length just expands to fill the screen. Which means if you happen to have the "right" device/screen size, it will look good and be easy to read. But on many (probably most) screens the lines will be far too long to be easily readable.