They just write "it's like alpha zero". So presumably they used a version of MCT...

thomasahle 3 months ago | parent | context | favorite | on: AI solves International Math Olympiad problems at ...

They just write "it's like alpha zero". So presumably they used a version of MCTS where each terminal node is scored by LEAN as either correct or incorrect.

Then they can train a network to evaluate intermediate positions (score network) and one to suggest things to try next (policy network).