HN2new | past | comments | ask | show | jobs | submitlogin

There's a bunch of benchmarks on the intro page including AIME 2025 without tools, SWE-bench Verified, Aider Polyglot, MMMU, and HealthBench Hard (not familiar with this one): https://openai.com/index/introducing-gpt-5/

Pretty par for course evals at launch setup.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: