Opus 4, with enough context, could do most all I wanted in a single shot. More often than not, when I had a bad outcome and was frustrated I would realize that I was the problem (in giving improper direction or missing key context).
I also was in a pretty sweet position having a boat load of credits and premo vertex rate limits so I could 'afford' to dump hundreds of thousands of tokens in context all day.
With Opus 4.5 and 4.6, I find I have to steer very actively.
This is comparing using Opus 4 directly rather than comparing the performance of the models in Claude Code for example, or any 'agentic' setup.
I thought that was the least likeable part of the article. They speculated wildly, somehow making the leap that a trained astronaut would not resort to a computer reset if the problems persisted to weave the narrative that this bug was super-duper-serious indeed. They didn't need that and it weakened the presentation.
It doesn't matter what your intention was, the point was that that was what it did. You didn't think hard enough about how people would use it given the (lack of) constraints and moderation you designed it with and you didn't take responsibility / accountability for what you had facilitated.
Your response to this seems to be: "People need to be more thick-skinned, I would take it down if they asked & it's unfair that they escalated to the dean." as if that somehow invalidates the criticism you're receiving. It doesn't. The fact that you don't really own up to that shows a lack of emotional maturity which makes it hard for people to feel sympathy for you. Until you actually understand why people had problems with what you made, you won't get a lot of sympathy for how you were treated either.
It's like those videos where some clueless racist spouting off garbage on the subway eventually gets punched by someone else. Yes, the assault is definitely "more" wrong. But you won't find a lot of people feeling sorry for the racist.
> In several EU western countries the most common gynelogical surgery act is re-building the hymen (so that the woman can pretend she's a virgin once she marries, often forcibly by her family).
You'd be getting practically the same result. If someone is too lazy to write their own commit messages they're definitely too lazy to write this blog post manually.
reply