LLM benchmarks are largely irrelevant when it comes to "state of the art". They ... | Hacker News

HN2new | past | comments | ask | show | jobs | submit

		int_19h 6 days ago \| parent \| context \| favorite \| on: GLM-5: Targeting complex systems engineering and l... LLM benchmarks are largely irrelevant when it comes to "state of the art". They tell you if the model does poorly, but they are not at all a reliable signal of whether it does well. Open-weights models are still lagging quite a bit behind SOTA. E.g. there's still no open model that can match GPT-5 Pro or Gemini 2.5 Pro, and the latter is almost a year old by now.

		help

stx5 6 days ago [–]

Not true. For example, I think Gemini 3 Pro also can't match Gemini 2.5 Pro. Without benchmarks, it's just personal taste.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact