Because the fastest cache levels are tiny, even in the largest and most advanced CPU's. There's plenty of evidence for the performance benefits of improved density and terseness in both code and data.
The M1 has a 192k instruction cache for performance cores which is not ‘tiny’.
If there is lots of evidence for the performance benefits of improved density vs the alternative of fixed instruction width in real world CPUs then I’m sure you’ll be able to cite it.