More

saberience · 2026-02-22T02:04:34 1771725874

Linking a tweet by Karpathy is a value add??

Karpathy calling OpenClaw a claw is an act of genius level naming?

Jesus wept.

simonw · 2026-02-22T05:37:03 1771738623

The value add here is in highlighting and encouraging a new piece of terminology in the AI space which I think is genuinely useful.

I think it's a good idea to define a name for the category of personal digital assistant agents that fit the general shape pioneered by OpenClaw. And "Claw" fits the bill.

saberience · 2026-02-22T02:00:37 1771725637

Can we please move past this whole OpenClaw hype?

Yes it’s an llm in a loop and can call tools. This also existed six months and a year ago, and it was called an ai agent.

And yes we can all vibe code them in 1000, 2000, or 10000 lines of code in zig, rust, or even c.

Game over man. Game over.

saberience · 2026-02-22T01:53:10 1771725190

Does he?

Claw is a terrible name for a basic product which is Claude code in a loop (cron job).

This whole hype cycle is absurd and ridiculous for what is a really basic product full of security holes and entirely vibe coded.

The name won’t stick and when Apple or someone releases a polished version which consumers actually use in two years, I guarantee it won’t be called “iClaw”

saberience · 2026-02-22T01:49:15 1771724955

The term “claw” for an agent in a loop is the most ridiculous thing I’ve heard in some time.

Why are Karpathy and SimonW trying to push new terms on us all the time? What are they trying to gain from this weird ass hype cycle?

saberience · 2026-02-22T01:47:36 1771724856

I’m actually sure it’s not going to stick, it’s a ridiculous name that has nothing to do with the actual product.

I almost guarantee no one will be using this term in two years.

Claws? It sounds stupid and the average consumer hates stupid spending terms, the same reason Microsoft “Zune” never caught on.

saberience · 2026-02-19T17:05:26 1771520726

I hope we keep beating this dead horse some more, I'm still not tired of it.

saberience · 2026-02-19T17:03:15 1771520595

Benchmarks aren't everything.

Gemini consistently has the best benchmarks but the worst actual real-world results.

Every time they announce the best benchmarks I try again at using their tools and products and each time I immediately go back to Claude and Codex models because Google is just so terrible at building actual products.

They are good at research and benchmaxxing, but the day to day usage of the products and tools is horrible.

Try using Google Antigravity and you will not make it an hour before switching back to Codex or Claude Code, it's so incredibly shitty.

mustaphah · 2026-02-19T17:09:37 1771520977

That's been my experience too; can't disagree. Still, when it comes to tasks that require deep intelligence (esp. mathematical reasoning [1]), Gemini has consistently been the best.

[1] https://arxiv.org/abs/2602.10177

gregorygoc · 2026-02-19T17:08:27 1771520907

What’s so shitty about it?

saberience · 2026-02-19T16:36:44 1771519004

Please no, let's not.

saberience · 2026-02-19T16:34:08 1771518848

I always try Gemini models when they get updated with their flashy new benchmark scores, but always end up using Claude and Codex again...

I get the impression that Google is focusing on benchmarks but without assessing whether the models are actually improving in practical use-cases.

I.e. they are benchmaxing

Gemini is "in theory" smart, but in practice is much, much worse than Claude and Codex.

user34283 · 2026-02-19T16:59:00 1771520340

I exclusively use Gemini for Chat nowadays, and it's been great mostly. It's fast, it's good, and the app works reliably now. On top of that I got it for free with my Pixel phone.

For development I tend to use Antigravity with Sonnet 4.5, or Gemini Flash if it's about a GUI change in React. The layout and design of Gemini has been superior to Claude models in my opinion, at least at the time. Flash also works significantly faster.

And all of it is essentially free for now. I can even select Opus 4.6 in Antigravity, but I did not yet give it a try.

konart · 2026-02-19T17:11:18 1771521078

> but without assessing whether the models are actually improving in practical use-cases

Which cases? Not trying to sound bad but you didn't even provide of cases you are using Claude\Codex\Gemini for.

rocho · 2026-02-19T20:32:48 1771533168

I find Gemini is outstanding at reasoning (all topics) and architecture (software/system design). On the other hand, Gemini CLI sucks and so I end up using Claude Code and Codex CLI for agentic work.

However, I heavily use Gemini in my daily work and I think it has its own place. Ultimately, I don't see the point of choosing the one "best" model for everything, but I'd rather use what's best for any given task.

cmrdporcupine · 2026-02-19T17:00:40 1771520440

Honestly doesn't feel like Google is targeting the agentic coding crowd so much as they are the knowledge worker / researcher / search-engine-replacement market?

Agree Gemini as a model is fairly incompetent inside their own CLI tool as well as in opencode. But I find it useful as a research and document analysis tool.

verdverm · 2026-02-19T19:23:07 1771528987

For my custom agentic coding setup, I use Claude Code derived prompts with Gemini models, primarily flash. It's night and day compared to Google's own agentic products, which are all really bad.

The models are all close enough on the benchmarks and I think people are attributing too much difference in the agentic space to the model itself. I strongly believe the difference is in all the other stuff, which is why Antropic is far ahead of the competition. They have done great work with Claude Code, Cowork, and their knowledge share through docs & blog, bar none on this last point imo.

skerit · 2026-02-19T16:45:22 1771519522

I'm glad someone else is finally saying this, I've been mentioning this left and right and sometimes I feel like I'm going crazy that not more people are noticing it.

Gemini can go off the rails SUPER easily. It just devolves into a gigantic mess at the smallest sign of trouble.

For the past few weeks, I've also been using XML-like tags in my prompts more often. Sometimes preferring to share previous conversations with `<user>` and `<assistant>` tags. Opus/Sonnet handles this just fine, but Gemini has a mental breakdown. It'll just start talking to itself.

Even in totally out-of-the-ordinary sessions, it goes crazy. After a while, it'll start saying it's going to do something, and then it pretends like it's done that thing, all in the same turn. A turn that never ends. Eventually it just starts spouting repetitive nonsense.

And you would think this is just because the bigger the context grows, the worse models tend to get. But no! This can happen well below even the 200.000 token mark.

reilly3000 · 2026-02-19T18:25:21 1771525521

Flash is (was?) was better than Pro on these fronts.

saberience · 2026-02-19T14:23:33 1771511013

Heh, I find Codex to be a far, far smarter model than Claude Code.

And there's a good reason the most "famous" vibe coders, including the OpenClaw creator all moved to Codex, it's just better.

Claude writes a lot more code to do anything, tons of redundent code, repeated code etc. Codex is only model I've seen which occasionally removes more code than it writes.