More

deflator · 2026-04-27T18:42:38 1777315358

Got invited to try this, but it was too expensive. I gave it two tasks that I would expect Codex 5.3 xhigh to take $1-2 of tokens on. It used $20 on each, and one was on medium with the other on xhigh!

deflator · 2026-04-23T19:46:07 1776973567

They are not good, and they seem to get worse as you increased effort. Weird

postalcoder · 2026-04-23T19:51:09 1776973869

Yeah. I've always loosely correlated pelican quality with big model smell but I'm not picking that up here. I thought this was supposed to be spud? Weird indeed.

throw310822 · 2026-04-23T19:58:06 1776974286

No but I can sense the movement, I think it's already reached the level of intelligence that draws it towards futurism or cubism /s

deflator · 2026-04-23T19:42:29 1776973349

Hmm. Any idea why it's so much worse than the other ones you have posted lately? Even the open weight local models were much better, like the Qwen one you posted yesterday.

simonw · 2026-04-23T20:11:25 1776975085

The xhigh one was better, but clearly OpenAI have not been focusing their training efforts on SVG illustrations of animals riding modes of transport!

irthomasthomas · 2026-04-23T20:18:50 1776975530

It beats opus-4.7 but looks like open models actually have the lead here.

deflator · 2026-04-22T13:13:59 1776863639

Makes sense. HF deserves the same awe as radioactive material. I've always found both fascinating. Like some kind of dark magic that curses you if you contact it.

deflator · 2026-04-16T18:45:11 1776365111

Makes my Mustang Mach E feel super outdated I wonder if the American car companies will be able to deliver this technology any time soon

metalman · 2026-04-16T22:26:19 1776378379

america would ban electrics if they could. the petro dollar is on the ropes. and news like "flash charging" is going to further the anti China hype machine, but the real kicker will be an econobox with 500 mile range and 5 min charge for $15000, $20000 with a solar roof for home charging.

deflator · 2026-04-16T16:25:28 1776356728

Model Welfare? Are they serious about this? Or is it just more hype? I really don't trust anything this company says anymore. "We have a model that is too dangerous to release" is like me saying that I have a billion dollars in gold that nobody is allowed to see but I expect to be able to borrow against it.

hgoel · 2026-04-16T23:29:21 1776382161

Maybe referring to it as welfare is odd, but these points are important. It isn't a good look to have a model that tends to get into self-deprecating loops like one of Google's older models, it's an even worse look and potential legal liability if your model becomes associated with a suicide. An overly negative chat model would also just be unpleasant to use.

With the weights being mostly opaque, these kinds of evaluations are an important piece of reducing the harm an AI model can cause.

deflator · 2026-04-17T13:13:36 1776431616

I feel that anthropomorphizing the model is also potentially very harmful. We've seen that in the LLM interactions that end in tragedy. It's the wording that bothers me.

deflator · 2026-04-15T15:13:53 1776266033

Reminds me of the coop mod for Half Life 1 years back. It completely removed PvP and set dozens of players against puzzles and waves of enemies.

deflator · 2026-04-07T18:12:59 1775585579

Very interesting. I think I will try this with my Codex use

Has anyone on here tried putting this in their AGENTS.MD, or similar?

IsomorphicAI · 2026-04-12T14:45:42 1776005142

Yes, I have this in my CLAUDE.md and it's picked up correctly.

Sometimes I have to re-ingest it - as the model CAN forget it.

Or it has suppressed the signal.

deflator · 2026-03-27T19:31:36 1774639896

Ironic that the website that has this article also features similar bloat and ads that the article complains about.

deflator · 2026-03-27T13:48:38 1774619318

The sky is falling, the sky is falling!

politelemon · 2026-03-27T13:52:07 1774619527

The sky falling is largely solved.

KellyCriterion · 2026-03-27T14:47:38 1774622858

With blockchain & DeFi?

:-D

thedangler · 2026-03-27T17:06:18 1774631178

build me facebook, no mistakes.