HN2new | past | comments | ask | show | jobs | submitlogin

LLM benchmarks are largely irrelevant when it comes to "state of the art". They tell you if the model does poorly, but they are not at all a reliable signal of whether it does well.

Open-weights models are still lagging quite a bit behind SOTA. E.g. there's still no open model that can match GPT-5 Pro or Gemini 2.5 Pro, and the latter is almost a year old by now.

 help



Not true. For example, I think Gemini 3 Pro also can't match Gemini 2.5 Pro. Without benchmarks, it's just personal taste.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: