Hacker News .hnnew | past | comments | ask | show | jobs | submit | deflator's commentslogin

Got invited to try this, but it was too expensive. I gave it two tasks that I would expect Codex 5.3 xhigh to take $1-2 of tokens on. It used $20 on each, and one was on medium with the other on xhigh!

They are not good, and they seem to get worse as you increased effort. Weird

Yeah. I've always loosely correlated pelican quality with big model smell but I'm not picking that up here. I thought this was supposed to be spud? Weird indeed.

No but I can sense the movement, I think it's already reached the level of intelligence that draws it towards futurism or cubism /s

Hmm. Any idea why it's so much worse than the other ones you have posted lately? Even the open weight local models were much better, like the Qwen one you posted yesterday.

The xhigh one was better, but clearly OpenAI have not been focusing their training efforts on SVG illustrations of animals riding modes of transport!

It beats opus-4.7 but looks like open models actually have the lead here.

Makes sense. HF deserves the same awe as radioactive material. I've always found both fascinating. Like some kind of dark magic that curses you if you contact it.

Makes my Mustang Mach E feel super outdated I wonder if the American car companies will be able to deliver this technology any time soon

america would ban electrics if they could. the petro dollar is on the ropes. and news like "flash charging" is going to further the anti China hype machine, but the real kicker will be an econobox with 500 mile range and 5 min charge for $15000, $20000 with a solar roof for home charging.

Model Welfare? Are they serious about this? Or is it just more hype? I really don't trust anything this company says anymore. "We have a model that is too dangerous to release" is like me saying that I have a billion dollars in gold that nobody is allowed to see but I expect to be able to borrow against it.

Maybe referring to it as welfare is odd, but these points are important. It isn't a good look to have a model that tends to get into self-deprecating loops like one of Google's older models, it's an even worse look and potential legal liability if your model becomes associated with a suicide. An overly negative chat model would also just be unpleasant to use.

With the weights being mostly opaque, these kinds of evaluations are an important piece of reducing the harm an AI model can cause.


I feel that anthropomorphizing the model is also potentially very harmful. We've seen that in the LLM interactions that end in tragedy. It's the wording that bothers me.

Reminds me of the coop mod for Half Life 1 years back. It completely removed PvP and set dozens of players against puzzles and waves of enemies.

Very interesting. I think I will try this with my Codex use

Has anyone on here tried putting this in their AGENTS.MD, or similar?


Yes, I have this in my CLAUDE.md and it's picked up correctly.

Sometimes I have to re-ingest it - as the model CAN forget it.

Or it has suppressed the signal.


Ironic that the website that has this article also features similar bloat and ads that the article complains about.


The sky is falling, the sky is falling!


The sky falling is largely solved.


With blockchain & DeFi?

:-D


build me facebook, no mistakes.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: