More

sutterd · 2026-04-26T19:34:14 1777232054

I never adopted Opus 4.6 because it was too prone to doing things on its own. Anthropic called it "a bias towards action". I think 4.5 and 4.7 are much better in this regard. I'm not saying they are immune to this kind of thing though.

sutterd · 2026-04-26T18:32:00 1777228320

I’m working on a side project and AI is writing all the code. The code it produces is not good, and this comes from someone who has experience producing bad code. One thing I’m worried about is places like GitHub being full of AI code, which leads to AI being trained on AI code. It seems like this will lead to a downward spiral.

sutterd · 2026-04-23T20:32:30 1776976350

What kind of performance are people getting now? I was running 4.7 yesterday and it did a remarkably bad job. I recreated my repo state exactly and ran the same starting task with 4.5 (which I have preferred to 4.6). It was even worse, by a large margin. It is likely my task was a difficult or poorly posed, but I still have some idea of what 4.5 should have done on it. This was not it. What experiences are other people having with the 4.7? How about with other model versions, if they are trying them? (In both cases, I ran on max effort, for whatever that is worth.)

sutterd · 2026-04-20T03:27:39 1776655659

With my use of Claude code, I find 4.7 to be pretty good about clarifying things. I hated 4.6 for not doing this and had generally kept using 4.5. Maybe they put this in the chat prompt to try to keep the experience similar to before? I definitely do not want this in Claude code.

mh- · 2026-04-20T05:22:00 1776662520

I agree with your thoughts on 4.6.

It's possible they tried to train this out of it for 4.7 and over corrected, and the addition to the system prompt is to rein it in a bit.

sutterd · 2026-04-19T02:33:49 1776566029

You don't have to use adaptive thinking. It had been turned off on my main work computer. I was using a different computer on a trip and I started getting so angry at Claude for doing a bad job. I evetually figured out it was adaptive thinking and set it to "hard" and it started working again. At the time I think "hard" was the top choice. With 4.7, my computer now shows "xhard", which I assume is the equivelent setting. There is one higher setting than this, which I haven't tried yet. I would tell you how to change these settings, but I don't remember. By the way, I have been happy with 4.7 so far. I actually did not like 4.6 and preferred 4.5 and used that most of the time until this new release.

scrollop · 2026-04-19T06:14:18 1776579258

"With Opus 4.6, extended thinking was a toggle you managed: turn it on for hard stuff, off for quick stuff. If you left it on, every question paid the thinking tax whether it needed to or not. Now, with Opus 4.7, extended thinking becomes adaptive thinking. "

https://claude.com/resources/tutorials/working-with-claude-o...

You want extended thinking? It's not adaptive thinking and opus will turn it on if it thinks it needs to. But it probably won't, according to user reports as tokens are expensive. Except opus 4.7 now uses 35% more and outputs more thinking tokens.

sutterd · 2026-04-20T03:07:46 1776654466

I am getting pretty good performance. Even on trivial questions it seems to go through the thinking process end. If they are using adaptive thinking, it seems to work much better than before. I will see how my experience goes with more usage.

matheusmoreira · 2026-04-19T17:14:12 1776618852

> You don't have to use adaptive thinking.

With Opus 4.7 you absolutely do. Users don't have a choice.

https://code.claude.com/docs/en/model-config

> Opus 4.7 always uses adaptive reasoning. The fixed thinking budget mode and CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING do not apply to it.

__s · 2026-04-19T04:22:01 1776572521

/effort

sutterd · 2026-04-16T15:47:19 1776354439

I liked Opus 4.5 but hated 4.6. Every few weeks I tried 4.6 and, after a tirade against, I switched back to 4.5. They said 4.6 had a "bias towards action", which I think meant it just made stuff up if something was unclear, whereas 4.5 would ask for clarfication. I hope 4.7 is more of a collaborator like 4.5 was.

sutterd · 2026-04-07T10:13:05 1775556785

I still use 4.5. I occasionally try 4.6 but always switch back. The “bias towards action” is what I hate. 4.5 would make sure it understands what I want. 4.6 will just make shit up. Maybe the Anthropic people always write crystal clear instructions so it works for them. For me, I just can’t get 4.6 to do what I want.

sutterd · 2026-03-15T19:28:30 1773602910

I was surprised to see a post by Petzold on this subject. I know who he is. But I don’t think you owe an apology here. I think you made a thoughtful comment. A post like his should be critiqued for what it says, not for the author’s previous work. And, fortunately, other people could give context on the significant work he has done.

sutterd · 2026-03-12T17:13:47 1773335627

I am trying a similar spec driven development idea in a project I am working on. One big difference is that my specifications are not formalized that much. Tney are in plain language and are read directly by the LLM to convert to code. That seems like the kind of thing the LLM is good at. One other feature of this is that it allows me to nudge the implmentation a little with text in the spec outside of the formal requirements. I view it two ways, as spec-to-code but also as a saved prompt. I haven't spent enough time with it to say how successfuly it is, yet.

issamG · 2026-03-12T18:30:19 1773340219

Do you save these "prompts" so you can improve, and in turn improve the code. to me Spec Driven Development is more than a spec to generate code, structured or not.

sutterd · 2026-03-13T04:09:49 1773374989

The spec contains formal, numbered items which are requirements and also serve to make tests (these are spec tests, additional implementation tests are also allowed by the implementer). When I said "they are not formalized as much", I mean I am not as strict on the spec format as CodeSpeak is, where their spec can be parsed with a tool. For me it is up to the LLM to use the spec itself. I have additional text beyond the requirement items which also influences how the LLM implements the code. I did this because it is too tough, for me at least, to prompt the LLM just based on strict requirements. This is perhaps cheating according to what you might call SDD. I'm just trying to be practical. The idea in the end is that this spec implies the code and maintaining the spec is the same as maintaining the code. Strictly speaking this won't be true, but I am hoping it still works anyway.

sutterd · 2026-03-07T20:00:42 1772913642

Is wealth the right term here? I thought it was supposed to measure production, with the actual measurement usually spending (with qualifiers). And, when comparing countries, you have to account for the different currencies. Currencies are typically trade balanced, which gives a rough equivelence for buying power, but that is not true with the dollar because, as the effective reserve currency, it has international demand outside of trade.

skybrian · 2026-03-07T21:20:25 1772918425

I suspect that the US having better investment opportunities than other countries (tech companies for example) might be more important than reserve currency status.

People tend to pay more attention to trade than investment, but investment flows are just as important. A trade deficit often means that foreign investors are buying and a trade surplus goes along with people investing in foreign countries.

BurningFrog · 2026-03-09T16:37:55 1773074275

Something like "economic value" is probably a more precise term than "wealth".