Hacker News new | past | comments | ask | show | jobs | submit login

They just write "it's like alpha zero". So presumably they used a version of MCTS where each terminal node is scored by LEAN as either correct or incorrect.

Then they can train a network to evaluate intermediate positions (score network) and one to suggest things to try next (policy network).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: