Got invited to try this, but it was too expensive. I gave it two tasks that I would expect Codex 5.3 xhigh to take $1-2 of tokens on. It used $20 on each, and one was on medium with the other on xhigh!
Yeah. I've always loosely correlated pelican quality with big model smell but I'm not picking that up here. I thought this was supposed to be spud? Weird indeed.
Hmm. Any idea why it's so much worse than the other ones you have posted lately? Even the open weight local models were much better, like the Qwen one you posted yesterday.
Makes sense. HF deserves the same awe as radioactive material. I've always found both fascinating. Like some kind of dark magic that curses you if you contact it.
america would ban electrics if they could.
the petro dollar is on the ropes.
and news like "flash charging" is going to further the anti China hype machine, but the real kicker will be an econobox with 500 mile range and 5 min charge for $15000, $20000 with a solar roof for home charging.
Model Welfare?
Are they serious about this? Or is it just more hype?
I really don't trust anything this company says anymore.
"We have a model that is too dangerous to release" is like me saying that I have a billion dollars in gold that nobody is allowed to see but I expect to be able to borrow against it.
Maybe referring to it as welfare is odd, but these points are important. It isn't a good look to have a model that tends to get into self-deprecating loops like one of Google's older models, it's an even worse look and potential legal liability if your model becomes associated with a suicide. An overly negative chat model would also just be unpleasant to use.
With the weights being mostly opaque, these kinds of evaluations are an important piece of reducing the harm an AI model can cause.
I feel that anthropomorphizing the model is also potentially very harmful. We've seen that in the LLM interactions that end in tragedy. It's the wording that bothers me.
reply