I think on HN we always underestimate how much momentum matters. Anthropic has so much clout and mindshare that even if they continue burning goodwill and everyone on HN ditches Claude Code and stops recommending it, they will still be revenue leader for years to come. Those enterprise contracts aren’t month-to-month.
My experience has been that this isn’t generally true, mainly because worse models pursue red herrings or get confused and stuck. a better model will get to the correct solution in fewer tokens, and my surface-level understanding of how RL works supports this.
The real question I see nobody asking is how GPT-5.4 beats Opus at a fraction of the price. I doubt it’s only a question of subsidization. My impression from the past is that GPT-5 was around a Sonnet-sized model, and 5-mini was Haiku-sized. At least on my codebase anyways, Codex one-shots tricky things that Opus needs several tries to fully get right.
It’s typically equivalent, sometimes better, sometimes behind. Better at following a well defined plan, less good at concept exploration and planning imo.
People make a hobby out of tricking chat apps to leak their system prompt. But I doubt there’s much gain to be had by using this one vs coming up with a custom prompt.
Yeah I set thinking as my default and never looked back. It’s my daily driver and extended thinking is usually not too slow. The way that the “instant” model trades quality for speed is not worth it and I don’t need the instant gratification. (But I also don’t do entertainment chatting so ymmv.)
reply