This eval’s goal is a bit unclear to me, especially given the example questions. They’re very trivia/minutiae like asking about sports goals for example, which is their stated desire to test factual knowledge. But will this ever be possible by an LLM, without web browsing - which they deliberately removed while evaluating?