Modelling Uncertainty in the Game of Go
- David Stern
Go is an ancient Chinese game of perfect information whose complexity has defeated attempts by Artificial Intelligence researchers to automate play. There is uncertainty about the future course of the game because of the sheer complexity of the game tree and the fact that computers (and humans) have limited computation speed. This thesis presents a number of models which use probability theory to represent and manage this uncertainty.
Firstly, the task of obtaining a probability distribution over available moves in a Go position is considered. A predictive model is trained from records of historical games between expert human players. Each move is represented by the exact pattern of stones in the local neighbourhood of the board and millions of these patterns are automatically harvested from the game records. The system can produce a distribution over all the legal moves in a position at a rate of hundreds of positions per second and in 34% of test positions the model perfectly predicts the moves of the expert players. A hierarchical version of the model is also developed with improved performance.
Next, probability theory is used for solving local tactical Go problems. Such problems are solved by searching the game tree until sufficient information is gathered that it is possible to prove by logical deduction whether a particular goal can be achieved. Before a proof is found, probabilities are used to represent the degrees of belief about whether reasoning about a position will lead to a logical proof and these probabilities are used to guide search. The move prediction model of the previous chapter is used to incorporate domain knowledge into search. The system can learn from previous search tasks how to solve future, unseen, problems more efficiently.
Thirdly this thesis considers the task of predicting the final territory outcome of a Go game given a mid-game position. A set of Boltzmann machine models is applied to this task, trained from records of expert games. A generalised version of the Swendsen-Wang sampling algorithm is used to perform efficient inference. Go players confirm that the territory predictions made by these simple models are reasonable.
In the final chapter some alternative Bayesian versions of Monte Carlo planning algorithms are tested on synthetic game trees. With correct setting of the hyperparameters, a simple Gaussian counting model finds the best move faster than the currently popular algorithm (Upper Confidence applied to Trees).