Hacker News .hn
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
WarmWash
26 days ago
|
parent
|
context
|
favorite
| on:
Sam Altman's response to Molotov cocktail incident
GLM 5.1, widely held up as the model at the heals, perhaps ever surpassing western models....
Gets 5% on ARC-AGI2 private set.
Chinese models are suspiciously good a benchmarks.
ctolsen
25 days ago
[–]
I mean, I could say the same about Gemini. 3.1 Pro tops a bunch of benchmarks out there but any practical use I've put it to it's underperforming both other proprietary and open weight models. Benchmarks are suspicious in general.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Gets 5% on ARC-AGI2 private set.
Chinese models are suspiciously good a benchmarks.