frtime3d's comments

frtime3d · 2025-10-16T01:55:41 1760579741

> If they specifically tried to cheat at this benchmark it would be obvious and they would be called out

I doubt it. Most would just go “Wow, it really looks like a pelican on a bicycle this time! It must be a good LLM!”

Most people trust benchmarks if they seem to be a reasonable test of something they assume may be relevant to them. While a pelican on a bicycle may not be something they would necessarily want, they want an LLM that could produce a pelican on a bicycle.