HN2new | past | comments | ask | show | jobs | submitlogin

Why not? One of the most intelligent things to do when stuck on a problem is to get outside help.

If allowing this behaviour raises a problem, you can always add constraints to the benchmark such as "final answer must come out under 15s" or something. The LLM can then make the decision to ask around in accordance to the time risk.



Because AI are good at devolving to the highest score, regardless of test intent. For most problems "ask_hooman", or especially the plural, would be much more effective. So, the degenerate case would dominate and tell you precisely zero about the intelligence of the AI. If a specific "tool" is more adept than the "AI" then "choose tool" will always be the correct answer. But I agree, a tight time constraint would help.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: