HN2new | past | comments | ask | show | jobs | submitlogin

Variable instruction sizes have a cost, but with only 2 instruction sizes like current RISC-V that cost remains very low as long as we don't have to decode a very large number of instructions each cycle, and it gives a huge code density advantage.


This biggest issue is one instruction spanning two cache lines, and even two pages. This means a bunch of tricky cases that is the source of bugs and overheads.

It also means you cannot tell instruction boundaries until you directly fetch instructions, so you cannot do any predecode in the cache that would help you figure out dependencies, branch targets, etc. These things matter when you are trying to fetch 8+ instructions per cycle.


> This biggest issue is one instruction spanning two cache lines

Even with fixed (32 bit) instruction lengths aligned on 32 bit, when we have to decode a group of 8 instructions you are facing this kind of issue.

So you either have to cut the instruction group (and thus not take full advantage of the 8 way decoder) or you have to implement a more complex prefetch with a longer pipeline. And these special cases can be handled in these pipeline stages.

> It also means you cannot tell instruction boundaries until you directly fetch instructions

I mean, AMD does that on x86, with 14 instruction lengths.

It can be done for RISC-V, it's much cheaper than x86, and it takes significantly less surface area than a bigger cache to compensate.


Given the compression stuff is an extension (and so far as I can tell the 16-bit alignment for 32-bit instructions that can result in that sort of spanning is part of that extension), so far as I can tell you could implement said extension for tiny hardware where every byte counts, and then for hardware where you're wanting to fetch 8+ instructions per cycle just ... not implement it?

Wait (he says to himself, realising he's an idiot immediately -before- posting the comment for once). You said upthread the C extension is specified as part of the standard UNIX profile, so I guess people are effectively required to implement it currently?

If that was changed, would that be sufficient to dissolve the issues for people wanting to design high performance implementations, or are there other problems inherent to the extension having been specified at all? (apologies for the 101 level questions, the only processor I really understood was the ARM2 so my curiosity vastly exceeds my knowledge here)


Have the ARM AArch64 designers ever commented on this? They intentionally left out any kind of compressed instructions, and certainly Apple at least cares a lot about code size.


Try this at 34:30 - from Arm’s architecture lead Richard Grisenthwaite. Earlier he says that several leading micro architects think that mixing 16 bit and 32 bit instructions (Thumb2) was the worst thing that Arm ever did.

https://m.soundcloud.com/university-of-cambridge/a-history-o...


He explicitly specifies that those micro-architects are at companies OTHER than ARM.

His own opinion appears to be that the worst thing ARM ever did was T2EE, designed for JIT compilers and compilers for dynamic languages. He says that by the time the chips came out compiler technology had advanced to the point that it was no longer useful and no one else used it.

A couple of other points picked up in the talk:

- He reverses Hennessy and Patterson wrt SPARC and MIPS.

- A64 effort started in 2007. So it took 5 years to freeze/publishing, the same as RISC-V.

- A64 architects thought code density is no longer important. Some people definitely disagree with that. At the time they probably thought amd64 was the only competition and matching/beating that was good enough.

- he seems to be regretting the 2nd operand shift because it fell naturally out of the 1985 micro-architecture, but it's a burden now. And yet it was included in A64 -- presumably because the initial processor pipelines had it anyway, because they supported A32. But now we have A64-only CPUs.

- LL/SC was the wrong thing to do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: