Browser Agent Benchmark: Comparing LLM models for web automation

wiradikusuma · 2026-02-01T05:45:36 1769924736

Since we're in this topic, can anyone suggest good AI-based tool for exploratory (fuzzy?) web testing?

pixel_popping · 2026-02-01T05:24:52 1769923492

It's lacking the best model (Opus 4.5) on the benchmark tho.

djohnston · 2026-02-01T15:13:26 1769958806

Yeah but then their own product might not score the highest.

pixel_popping · 2026-02-02T13:43:26 1770039806

Exactly why I'm pointing it out, which feels a bit corrupt, but understandable.

djohnston · 2026-02-02T15:31:24 1770046284

tbh i was a bit cranky yesterday - even if they are #2 on a legit benchmark that would be impressive