It's worse: Google added and then dropped even experimental support of jxl in 2021-2022. Several years later they adopted the rust jxl library, but have kept it behind experimental flags.
Nope, current flagship models are very happy to make huge missteps across the whole development stack of design, planning, implementation, and testing -- but playing different models against each other can help catch more egregious issues.
This is a very recent model behavior change: for me, Opus 4.6, Gemini 3.1 Pro, and ChatGPT 5.4(ish) -- prior models and harnesses suffered much more from sycophancy.
(I still prompt some questions and reviews with "our intern suggested..." to allow models to judge the quality of the content apart from the messenger)
I've found this surprisingly effective. Higher "thinking levels" may result in more than one approach being considered, but you can also tell your LLM to do brainstorming explicitly: https://photostructure.com/coding/claude-code-replan/
Did that happen to a lot of companies during the log4shell fiasco? I'm sure some companies had their permissions misconfigured in a way such that a malicious actor who could execute code on their servers could also drop their database and delete their backups.
reply