Deep TD-Go
Code
Motivation: The idea was to preprocess a MLP using deep learning on Go board positions from professional games and then train it to play Go using Temporal Difference in the manner of TD-Gammon. The project was done with under the advisement of Professor Charles Isbell.
In the end, learning a good value function for Go was quite difficult, but I was able to show that the weights learned by deep learning were useful for classifying final board positions compared to randomly initialized networks. I also learned that one idea is not enough – a great project requires several.
Motivation: The idea was to preprocess a MLP using deep learning on Go board positions from professional games and then train it to play Go using Temporal Difference in the manner of TD-Gammon. The project was done with under the advisement of Professor Charles Isbell.
In the end, learning a good value function for Go was quite difficult, but I was able to show that the weights learned by deep learning were useful for classifying final board positions compared to randomly initialized networks. I also learned that one idea is not enough – a great project requires several.