When you realize the compiler's optimizations only account for about 10% of the total program's performance you find that the other 90% is entirely up to the programmer.
Architecture, data structures, batching operations, memory locality, and a bunch of other metrics are all concepts the compiler can't really help you with whatsoever and they have a much larger impact on performance than the 10% the compiler is actually able to optimize.
The problem is that either programmers don't care, or they can't make the distinction between premature optimizations and architecture planning.
You're right to emphasise good data-structures and algorithms (also concurrency, parallelism, etc), but compiler optimisation is nothing to sneeze at. '10%' is laughably off-base.
In a sense you're right, but hand-tuning assembly is kind of an orthogonal problem to determining whether you're using the right algorithms and data structures.
Talks from Mike Acton and Scott Meyers, specifically "Data-Driven Development" and "CPU Caches and why you should care" respectively.
I forgot exactly where I got that number, but it's been a pretty good metric so far.
In a nutshell; the compiler is great a micro-optimizations and absolutely terrible at macro-optimizations. The former will get you a few percent of perf boosts while the later usually results in orders of magnitudes of performance gains.
Its near impossible to apply macro-optimizations at the end of a project without massive refactors.
Architecture, data structures, batching operations, memory locality, and a bunch of other metrics are all concepts the compiler can't really help you with whatsoever and they have a much larger impact on performance than the 10% the compiler is actually able to optimize.
The problem is that either programmers don't care, or they can't make the distinction between premature optimizations and architecture planning.