I have yet to find an LLM that solves the 24 puzzle with these numbers (6 9 9 10) correctly.
Many seemingly capable LLM will produce incorrect intermediate or final results, e.g. claiming 9 × 9 ÷ (10 - 6) = 24.
Did I miss anything in my prompt? Or if there's a reason LLMs are bad at these questions?