This is really not a big deal with RISC-V's 2 instruction lengths and the encodi...

This is really not a big deal with RISC-V's 2 instruction lengths and the encoding they use.

If decoding 32 bytes of code (256 bits, somewhere between 8 and 16 instructions) You can figure out where all the actual instructions start (yes, even the 16th instruction) with 2 layers of LUT6.

You can then use those outputs to mux two possible starting positions for 8 decoders that do 16 or 32 bit instructions, plus 8 decoders what will only ever do 16 bit instructions from fixed start positions (and might output a NOP or in some other way indicate they don't have an input).

OR you can use those outputs to mux the outputs of a 8 decoders that only do 32 bit instructions and 8 decoders that do 16 or 32 (all with fixed starting positions), plus again 8 decoders that only do 16 bit instructions from fixed start positions (possibly not used / NOP).

The first option uses less hardware but has higher latency.

That, again, is for decoding between 8 and 16 instructions per cycle, with an average on real code of close to 12.

That is more than is actually useful on normally branchy code.

In short: not a problem. Unlike x86 decoding.