It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.
Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.
Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.