GPT 5.5 and 5.4 are such great models. I just tried opus 4.8 and took 30 minutes to be confronted with a bit laziness that makes me go crazy. 5.5 just doesn’t have this issue.
How do you compare them to 5.3 Codex? I am using 5.3 Codex for a while, I subjectively think it does better job than Opus 4.6/4.7, with a fraction of the cost, and I did give 5.5 a try and it seems a bit better but magnitudes more expensive.
5.3 is good but talks like a robot, it’s too hard to understand what exactly it’s talking about. When using droid I use it to act like worker model and does a great job.
All 5.x models suffer from weirdness in the way it writes but 5.5 and 5.4 are much better and now offer a good balance, direct but without being like Claude.
They demoed today 8i running ate 1300 to 1600ish tokens per second. I imagine that is caused by having a single rack serving the model just for the demo.
There's a limit to how much you can "scale" this process, it's linear, but if we did napkin math based on vllm parallel batched streams only lose around ~50% performance compared to single-stream output so doesn't explain the ridicioulusly fast numbers here.
I wish google just came out and told us how large their flash model is, because if it's as big or smaller than gpt-5.4-nano that's the real headline here.
Yeah for 10-60 BILLION. which again makes this even stupider.
For this amount of money you can rebuild cursor and everything else on the market, and with the rest of 9-59 Billion, you just hire experts in coding and let them code real high quality code examples.
And then you just use your existing grok pipeline and just add this functionality.
Buy "Cursor", not "Cursor's IP". This means brand, users, and a shitton of data.
And if you combine a shitton of data with a lot of compute, large userbase and good engineers, you have a pretty good chance of doing something interesting.
I expected the same out come you're saying here, but in my experience this hasn't been the case. I've been researching new acoustic guitars to purchase, and I've been getting an equal amount of suggestions from the major brands and the small brands.
Part of it though is I'm giving lots of context (e.g. guitar player for 10+ years, huge Opeth fan, looking for something with as close to an Ibanez style neck as possible under $1000)
I think guitars market is kind of exception because it is pretty normal for guitar players to search for "guitar like fender but cheaper". There are tons of reddit/forum discussions about this and those small brands are actually very well known in community, because majority of guitar players play on cheap instruments. Youtuber Phillip Mcknight often talked about that cheap guitars move in ridiculous volumes compared to more expensive ones like Gibson or Fender.
I think if you ask something generic like “shoes”, this could be true.
When I’ve worked with Claude on finding brands for fashion (e.g. here’s a small watchmaker I like, what are similar options?) it does research and picks great options. Some are big, others are small producers.
I kind feel the same. I’m learning things and doing things in areas that would just skip due to lack of time or fear.
But I’m so much more detached of the code, I don’t feel that ‘deep neural connection’ from actual spending days in locked in a refactor or debugging a really complex issue.
I strongly agree on the refactor, but for debugging I have another perspective: I think debugging is changing for the better, so it looks different.
Sure, you don't know the code by heart, but people debugging code translated to assembly already do that.
The big difference is being able to unleash scripts that invalidate enormous amount of hypothesis very fast and that can analyze the data.
Used to do that by hand it took hours, so it would be a last resort approach. Now that's very cheap, so validating many hypothesis is way cheaper!
I feel like my "debugging ability" in terms of value delivered has gone way up. For skill, it's changing. I cannot tell, but the value i am delivering for debugging sessions has gone way up
As someone who's switched from mobile to web dev professionally for the last 6 months now. If you care about code quality, you'll develop that neural connection after some time.
But if you don't and there's no PR process (side projects), the motivation to form that connection is quite low.
> If you care about code quality, you'll develop that neural connection after some time.
No, because you can get LLMs to produce high quality code that has gone through an infinite number of refinement/polish cycles and is far more exhaustive than the code you would have written yourself.
Once you hit that point, you find yourself in a directional/steering position divorced from the code since no matter what direction you take, you'll get high quality code.
I agree that they called many things remarkably well! That doesn't change the fact that AI 2027 is not a thing which happened, so it isn't valid to point out "this killed us in AI 2027." There are many reasons to want to preserve CoT monitorability. Instead of AI 2027, I'd point to https://arxiv.org/html/2507.11473.
It’s funny how you train a machine to mimic human behavior then marketing team decides to promote it “Look! It’s human! Look how it thinking about existence!” while a huge percentage of humanity produced content is exactly about the uncertainty of human existence and that got used to train the model.
I see us collectively forgetting the training process as time goes on, and I think that explains why people get so surprised by some pretty obvious outcomes of said training. Perhaps also why people keep anthropomorphising these outcomes.
No it's not very good. But when you run out of Claude tokens it's perfectly fine for small stuff.
Cursor's inline autocomplete is very good though, much better than anything I could reproduce in Zed with various 3rd party "edit" LLMs (although checking google, they announced a new model since I tried it https://zed.dev/blog/zeta2)
I ran parallel prompt with composer 2 and gpt5.3 codex. Composer did slightly better, in terms of variable naming and extra tweaks to loosely related files to keep the codeb consistent.
reply