Yeah, it's a bit like taking the output of a student project in a compiler construction class and using it to judge whether said student is capable of innovation without telling them in advance they'd be judged on that rather than on the stated requirements of the course.
It'd be interesting to prompt it to do the same job but try to be innovative.
To your point, yeah, I mostly don't want AI to be innovative unless I'm asking for it to be. In fact, I spend much more time asking it "is that a conventional/idiomatic choice?" (usually when I'm working on a platform I'm not super experienced with) than I do saying "hey, be more innovative."
Yeah, I'd love to find time to. But e.g. I think that is also a "later stage". If you want to come up with novel optimizations, for example, it's better to start with a working but simple compiler, so it can focus on a single improvement. Trying to innovate on every aspect of a compiler from scratch is an easy way of getting yourself into a quagmire that it takes ages to get out of as a human as well.
E.g. the Claude compiler uses SSA because that is what it was directed to use, and that's fine. Following up by getting it to implement a set of the conventional optimizations, and then asking it to research novel alternatives to SSA that allows restarting the existing optimizations and additional optimisations and showing it can get better results or simpler code, for example, would be a really interesting test that might be possible to judge objectively enough (e.g. code complexity metrics vs. benchmarked performance), though validating correctness of the produced code gets a bit thorny (but the same approach of compiling major existing projects that have good test suite is a good start).
If I had unlimited tokens, this is a project I'd love to do. As it is, I need to prioritise my projects, as I can hit the most expensive Claude plans subscription limits every week with any of 5+ projects of mine...