DESIGN AND REALISATION OF A PROGRAM FOR PLAYING TANTRIX
Pieter BOLLE Thesis submitted for the degree of Master of Engineering 2004–2005 Supervisor :
Prof. Dr. D. DE SCHREYE
Faculteit Toegepaste Wetenschappen Departement Computerwetenschappen Celestijnenlaan 200A B-3001 Leuven (016) 32 77 00
K.U. Leuven Academic year 2004–2005
Name and first name student : Bolle Pieter Title : Design and realisation of a program for playing Tantrix Dutch translation : Ontwerp en realisatie van een programma voor het spelen van Tantrix ACM Classification: I.2.8 AMS Classification: 68T20 Abstract : In this thesis a design and implementation was made of a new robot for playing Tantrix . Tantrix is a strategic board game invented by Mike McManaway. It consists of fifty-six hexagonal tiles with three coloured lines on them which have to be linked to make the longest line or loop. Minimax-based search techniques were used combined with heuristics to measure real, virtual and fair share line length to make a robot which is able to play Tantrix at an advanced level. This robot currently plays on Tantrix.com and is called Oliver. Furthermore, it was investigated how these algorithms can be used in a distributed system.
Thesis submitted for the degree of Master of Engineering in Computer Science Supervisor :
Prof. Dr. D. De Schreye
Readers :
ir. Ruben Vandeginste Prof. Assist. A. Biryukov
Counsellor :
Prof. Assist. A. Biryukov
Department DTAI
Acknowledgements I would like to thank my supervisors Prof. Danny De Schreye and Prof. Alex Biryukov to bring me in touch with this interesting domain and subject. I was always welcome in their offices for enlightening discussions. I would also like to thank the Tantrix team: Mike McManaway for inventing this fascinating game, Dave Dyer for programming GoodBot and supporting me with programmingrelated issues and Britta Steude for answering mountains of mail. I hope that, with their enthusiasm, they will succeed in spreading Tantrix to the four corners of the world. Thanks to Rob Morton, for introducing me to Tantrix and for his insightful comments which were very helpful. I wish him to stay at the top of Tantrix ’s world rankings and to win more tournaments in the future. I would also like to thank Matt Peek, former world champion and Tantrix grand master, for reviewing some parts of this text. Furthermore, I would like to thank my fellow students and friends for the many nice moments we had while studying and working together, but especially, for the great moments when we were not working at all. Many thanks to Mathieu, Melanie, Hilke, Joris, Wim, Yves, Lies, Eveline, Liesbeth, Mathieu and Eline, Stijn and Saskia. Thanks to all the people who lived next to me in the last five years: Kurt, Frederik and Stefanie, Christophe, Koen, Alain, Yanick, Stijn and Wouter. Last but not least I would also like to thank my parents for giving me the opportunity to study and my sister Lieselot. A special word of thanks goes to Nele and her parents for their kindness and support. Pieter Bolle Leuven, May 2005
iii
Contents Introduction
1
1 Tantrix history and game description
3
1.1
Tantrix history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2
Description of the game . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2.1
Tantrix tiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2.2
Objectives and the start of a new game . . . . . . . . . . . . . . . .
4
1.2.3
The rules of the game . . . . . . . . . . . . . . . . . . . . . . . . .
6
2 Previous work 2.1
2.2
10
Existing robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.1
History of the robots playing Tantrix . . . . . . . . . . . . . . . . .
10
2.1.2
How Robot and GoodBot work . . . . . . . . . . . . . . . . . . . .
10
Human strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2.1
Startgame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2.2
Midgame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.3
Endgame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.2.4
Global strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
3 Algorithms 3.1
18
Search algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.1.1
The Tantrix game tree . . . . . . . . . . . . . . . . . . . . . . . . .
18
3.1.2
Static evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.1.3
Minimax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.1.4
Minimax adapted to Tantrix . . . . . . . . . . . . . . . . . . . . . .
21
iv
3.1.5
Alphabeta search . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.6
Alphabeta adapted to Tantrix . . . . . . . . . . . . . . . . . . . . .
28
3.2
Problems with the search algorithms . . . . . . . . . . . . . . . . . . . . .
30
3.3
Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.3.1
Real line length . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.3.2
Virtual line length . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3.3
Fair share line length . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.3.4
Iterative filling of interesting places score . . . . . . . . . . . . . . .
37
3.3.5
Combination of the heuristics . . . . . . . . . . . . . . . . . . . . .
37
3.4
The problem with genetic algorithms . . . . . . . . . . . . . . . . . . . . .
40
3.5
A distributed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4 Design and implementation 4.1
42
Interface details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.1.1
Details of the game status representation . . . . . . . . . . . . . . .
44
4.1.2
Advantages and limitations of the interface . . . . . . . . . . . . . .
45
4.2
Design of Oliver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3
Statistics of the game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3.1
Storing information . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3.2
Retrieving information . . . . . . . . . . . . . . . . . . . . . . . . .
49
A distributed implementation . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.4.1
’Monte Carlo’ evaluation . . . . . . . . . . . . . . . . . . . . . . . .
50
4.4.2
An alternative approach . . . . . . . . . . . . . . . . . . . . . . . .
50
4.4.3
The difference with the ’iterative filling of interesting places’ heuristic 51
4.4.4
Implementation of the ‘Monte Carlo’ approach . . . . . . . . . . . .
4.4
5 Results 5.1
51 54
Scoring methods
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.1.1
Winning percentage . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.1.2
Tournament score . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.1.3
Lobby ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.1.4
Master ranking and ELO rating . . . . . . . . . . . . . . . . . . . .
58
v
5.2
5.3
5.4
Score of the implementations . . . . . . . . . . . . . . . . . . . . . . . . . .
61
5.2.1
GoodBot vs Robot . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
5.2.2
Implementation of the virtual line heuristic . . . . . . . . . . . . . .
61
5.2.3
Implementation of the fair share line length heuristic . . . . . . . .
62
5.2.4
Distributed implementation . . . . . . . . . . . . . . . . . . . . . .
62
5.2.5
Final implementation . . . . . . . . . . . . . . . . . . . . . . . . . .
62
Statistics of the game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.1
Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.2
Evaluated nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
Analysis of the style of playing . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.4.1
The first game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
5.4.2
A second game . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
6 Future research
72
6.1
Faster play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
6.2
Better play . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
7 General conclusion
74
A Source code of the search algorithms
76
A.1 Minimax algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 Initialising minimax
76
. . . . . . . . . . . . . . . . . . . . . . . . . .
76
A.1.2 The minimax algorithm . . . . . . . . . . . . . . . . . . . . . . . .
77
A.2 Alphabeta algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
A.2.1 Initialising alphabeta . . . . . . . . . . . . . . . . . . . . . . . . . .
80
A.2.2 The alphabeta algorithm . . . . . . . . . . . . . . . . . . . . . . . .
81
B Source code of the heuristic algorithms
84
B.1 Heuristic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
B.1.1 Measuring real line length . . . . . . . . . . . . . . . . . . . . . . .
84
B.1.2 Measuring virtual line length
. . . . . . . . . . . . . . . . . . . . .
85
B.1.3 Measuring fair share line length . . . . . . . . . . . . . . . . . . . .
87
B.1.4 Combination of heuristics . . . . . . . . . . . . . . . . . . . . . . .
89
vi
C Analysis of some games
91
C.1 A first game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
C.2 A second game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
D Nederlandse samenvatting
93
D.1 Inleiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
D.2 Beschrijving van het spel . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
D.2.1 De Tantrixtegels . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
D.2.2 Doel van het spel . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
D.2.3 De spelregels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
D.3 Bestaande robots en menselijke strategie¨en . . . . . . . . . . . . . . . . . .
96
D.3.1 Bestaande robots . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
D.3.2 Menselijke strategie¨en . . . . . . . . . . . . . . . . . . . . . . . . .
96
D.4 Algoritmes en heuristieken . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
D.4.1 De spelboom van Tantrix . . . . . . . . . . . . . . . . . . . . . . . .
98
D.4.2 Heuristieken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
D.5 Implementatie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 D.5.1 Een gedistribueerde implementatie . . . . . . . . . . . . . . . . . . 101 D.6 Resultaten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 D.6.1 Performantie van de implementatie . . . . . . . . . . . . . . . . . . 102 D.6.2 Statistieken van het spel . . . . . . . . . . . . . . . . . . . . . . . . 103 D.7 Een blik op de toekomst . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 D.8 Besluit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Bibliography
105
List of figures
106
vii
Introduction There is a popular clich´e, which says that you cannot get out of computers any more than you have put in, that computers can only do exactly what you tell them to, and that therefore computers are never creative. This clich´e is true only in a crashingly trivial sense, the same sense in which Shakespeare never wrote anything except what his first schoolteacher taught him to write – words. - Richard Dawkins
Since the beginning of human history, people have tried to understand the human way of thinking. Nowadays, we still do not fully understand how a collection of billions of cells can perceive, understand, predict and manipulate a world, far larger and more complicated than itself. The research in the field of artificial intelligence goes even further; it does not only attempt to understand, but also to build intelligent systems. Since research on artificial intelligence started in the fifties, much work has been done in the field of state space search, by using common board games. In addition to their inherent intellectual appeal, board games have some interesting properties. Firstly, most board games are played using a well-defined set of rules. This makes it easy to represent the states and the actions, without the ambiguities and complexities inherent in less structured problems. Secondly, board games can generate extremely large search spaces. These spaces are large and complex enough to require powerful techniques for determining which alternatives to explore. The achievements of AI reseach in board games have been tremendous in the past decades. The world champion of the game checkers, Marion Tinsley, was defeated in 1994 by a program called Chinook. A few years later, in 1997, another program called Logistello defeated the human world champion of Othello, Takeshi Murakami, by 6-0. In the same year, also Deep Blue achieved a narrow victory over Garry Kasparov in chess. With this victory, an old AI dream became true. However, Deep Blue does not use humanlike reasoning patterns. Instead, it uses brute-force searching on fast hardware. The main lesson learned from this event is that in some domains the creativity, reasoning and intuition of humans can be exceeded by brute-force search using simple evaluation functions. On the other hand, Logistello and TD-Gammon, a program for playing backgammon, showed that it is possible to exceed and achieve human master level by combining clever search algorithms with machine learning algorithms. 1
2
Despite these successes, still many problems remain. At this moment, all computer programs for playing the ancient game Go can still be defeated by average players. For numerous other games, computer programs have not reached human master level yet. Tantrix is another example of a strategy game. It appeared in the state as we know it nowadays in 1992. The game was invented by Mike McManaway from New-Zealand. In the game, chance is also involved. In 1997, the first robots were developed to play Tantrix . These robots play quickly, but can be beaten by advanced players. In this thesis, the goal was set to make a new robot for playing Tantrix at human master level. In chapter 1, a short overview of the history is given and the rules for playing Tantrix are explained. Chapter 2 gives an overview of the robots which already existed before this project. In the same chapter, the strategies that have been discovered so far are described as well. Chapter 3 explains how minimax and alpha-beta were used in the implementation and a light is also thrown on the different heuristics and how they can be combined. In chapter 4, an overview of the implementations is given. Chapter 5 discusses the results achieved by this project. Next, chapter 6 makes some suggestions for future improvements and finally, a general conclusion summarises the main findings of this thesis.
Chapter 1 Tantrix history and game description 1.1
Tantrix history
Tantrix, derived from tangled tracks was invented in 1987 by Mike McManaway who was born and lives in New Zealand. He was the national champion of New Zealand in Backgammon. The first version in which Tantrix appeared was called Mind Game, named after the shop he owned. It consisted of fifty-six hexagonal pieces as it still does nowadays, but the tiles had only black and red lines. This version had to be played by two players. In 1991 Tantrix changed towards the game as it is known today. Four colours are used now: red, yellow, green and blue, and this version can be played with 2-4 players. The last change took place in 1992. Mind Game contained eight tiles with three straight lines on it. The decision was taken to remove these eight tiles, as they only fit in three different forced spaces and also slowed down the game. Figure 1.2 shows the current tileset. Tantrix is available in a game pack. Since 1996 it can also be played on the internet at http://www.tantrix.com. On this website, there are about 1500 different players, of which 120 masterplayers, 10 international masters but only one Tantrix grandmaster, Matt Peek.
1.2 1.2.1
Description of the game Tantrix tiles
Tantrix consists of a set of 56 different tiles. Each tile has a black background and three colours on it. Figure 1.2 shows all 56 tiles used to play Tantrix. Although each tile is unique, there is some structure in the tile set. As there are 56 tiles and four colours, this means that 42 tiles contain red, 42 tiles yellow, 42 green and 42 blue. When we ignore the colours on the tiles, we see that there are four possible shapes for a tile. This is shown in figure 1.3. The four different shapes are sint (single intersection), brid (bridge), chin 3
Chapter 1: Tantrix history and game description
4
• 56 tiles, 4 colours • The score is the length of the longest line • Only the longest line counts • Loops get double points • Colours have to match • Forced space have to be filled • Time is restricted to 15 minutes per player Figure 1.1
A summary of the game
(Chinese character) and rond (roundabout). In the tile set, there are eight roundabouts, twelve bridges, twelve Chinese characters and twenty-four single intersections. The tile set of Tantrix can be compared to the cards in a card game, as it is very important in both games to know which tiles have already been played, which tiles are available on the racks of the players and which tiles are still hidden in the bag.
1.2.2
Objectives and the start of a new game
Although Tantrix can be played with up to four players, the game is typically played by two opponents. There are four colours in Tantrix : red, yellow, green and blue. Each player chooses a different colour and has to pick one tile from the bag. All tiles are numbered, and the player who has taken the tile with the highest number has to start the game. Before the game is started, each player takes five more tiles from the bag. The rest of the tiles remain hidden in the bag and will be retrieved one by one when a tile has been used in the game. Every player has to make his own tiles visible to the opponent(s). The goal of the game is to build a line which is longer than the lines of the opponent. Only the highest-scoring line or loop counts, where scores are counted one for every tile composing the line or two for every tile composing a loop. The first player starts the game by putting one of his six tiles in the middle of the table and by picking a new random tile from the bag. Then, the player on his left can choose one of his tiles, put it next to the tile of the first player and take a new random tile from the bag. In tournament games total thinking time for each player is limited to fifteen minutes.
Chapter 1: Tantrix history and game description
Figure 1.2
Figure 1.3
The current tile set (from Tantrix.com [2])
Four basic shapes (from Tantrix.com [7])
5
Chapter 1: Tantrix history and game description
1.2.3
6
The rules of the game
A turn A turn in Tantrix typically consists of three actions. Firstly, you have to fill all forced spaces that can be filled by one of the tiles you have. Secondly, you can play your so called free move and finally, you have to fill again the forced spaces you can fill. An example of a forced space is shown in figure 1.4. A forced space is created when an empty place on the board is surrounded by exactly three tiles 1 . If more than one tile of your rack fits in the forced space, you can play the tile of your choice. If one tile fits in two or more spaces, you can also choose in which forced space you play the tile. As long as there are tiles left in the bag, you get a new tile each time you have used one in the game. It is possible that this new tile might fit in another forced space. You have to repeat this procedure until you have filled all forced spaces on the board. When all the forced spaces you could fill are filled, you may play your free move. During your free move you can play any of the tiles on your rack on any empty space on the board where the tile fits. After this free move you once again have to fill all forced spaces that can be filled. When you have filled all the possible forced spaces, then your turn is over. Rules and restrictions to play a tile When you play a tile, you have to follow some rules and restrictions. As shown in figure 1.5, the colours of the tiles have to match. This rule is called the golden rule. The game situation at the right side of figure 1.5, which represents the linking of two different coloured lines, is not allowed. Furthermore, there are three additional restrictions which are valid as long as there is a tile left in the bag. These restrictions are made, because without them games are likely to become blocked. A first restriction says that it is not allowed to create a forced space surrounded by three links of the same colour. No tile from the Tantrix tile set could ever fill such a forced space. This restriction rule is shown in figure 1.6. Secondly, it is not allowed to surround an empty space with four tiles. This rule is illustrated in figure 1.7. Finally, neither is it allowed to play along a controlled side. This is explained by means of figure 1.8. In that figure, the forced space marked with A has to be filled before space B can be filled. This rule even holds for a whole row of places. In figure 1.9, you can see an example of such a controlled side. Space A has to be filled before space B, space B before space C and space C before space D. 1
When the bag is empty and the restrictions are lifted, then a forced space can be surrounded by more then three tiles.
Chapter 1: Tantrix history and game description
Figure 1.4
Figure 1.5
Four basic shapes
Colours of touching links have to match (from Tantrix.com [5])
Figure 1.6
Figure 1.7
The first restriction rule (from Tantrix.com [4])
The second restriction rule (from Tantrix.com [4])
7
Chapter 1: Tantrix history and game description
8
Endgame When the bag is empty, the three additional restrictions do not hold anymore. This phase of the game is called the endgame. In these last twelve moves, it is allowed to create forced spaces with three links of the same colour. Furthermore, you can also surround a forced space with four or more tiles and the controlled sides are lifted. The end of the game When all the tiles are played on the board, the game is over. An example of a board situation at the end of a game is shown in figure 1.10. In this game, a person with colour blue has played against someone else with colour red. The results a player receives are the highest points assigned to a line or a loop in his colour. The points assigned to a line is the length in tiles of the line. The points assigned to a loop is twice the length in tiles of the loop. Scoring and timing In the game represented by figure 1.10 ‘red’ has a line of 22 tiles, a line of 13 tiles, a line of 2 tiles, 2 lines of 1 tile and a loop of 3 tiles. The loop of 3 tiles receives 6 points, but this is less than the line of 22 tiles, so the final score for red is 22. On the other hand, ‘blue’ has a loop of 22 tiles, 2 lines of 5 tiles, a line of 2 tiles and 8 lines of 1 tile. Here, the highest points go to the loop of 22 tiles, which receives a score of 44. This means that ‘blue’ has won this game. As explained in the Tantrix tournament rules [10], there is a time limit of fifteen minutes per player per game. If you exceed the limit however, you lose tournament points.
Chapter 1: Tantrix history and game description
Figure 1.8
The third restriction rule (from Tantrix.com [4])
Figure 1.9
Figure 1.10
A controlled side (from Tantrix.com [7])
A gameboard at the end of the game
9
Chapter 2 Previous work 2.1 2.1.1
Existing robots History of the robots playing Tantrix
This work is not the first effort to create a robot to play Tantrix . Years ago, in 1997, two programs called Challenger and Defender were created by Richard Clarke, the owner of Polymedia. Richard Clarke is from Nelson, New Zealand, the town where Mike McManaway also lives. Clarke’s two robots were commercially sold and they run on the Windows operating system. Although these robots have never been ranked, Mike McManaway, the inventor of Tantrix estimates their lobby ranking on 550 and 650 respectively. These robots can easily be beaten by a novice player. In 1996, Tantrix was made available on the web too. Internetivity, based in Ottawa (Canada) created this first online version of Tantrix programmed in Java. Two years later, Dave Dyer became the programmer at Tantrix.com. He programmed Robot3, Robot2 and Robot in 1997. These robots have a fixed lobby ranking of 300, 550 and 750 respectively. In 2005 he also programmed GoodBot as a reaction to our project. GoodBot has a non-fixed lobby ranking, varying around 840.
2.1.2
How Robot and GoodBot work
Both Robot and GoodBot were programmed by Dave Dyer and are closed source. A short description about these two robots can only be found on [6]. Both robots consist of a move generator, an evaluator and a lookahead manager. The move generator is a process that tries all six tiles in all possible positions and remembers the legal moves. The evaluator is a process that tries to estimate what the final score of the line will be. First, each line is considered separately. The maximum estimate for each line is an estimate for the final score. For each line it considers which nearby lines are certainly connected 10
Chapter 2: Previous work
11
or will probably be connected. Groups of lines that might get connected are considered as one line, modified by the probability that the connections will actually be made. This probability is estimated by counting the remaining tiles that fit in the connection space(s). Every group of lines either forms a loop or has two open ends. The open ends are predicted to be extended by a fair share of tiles left in the bag. The lookahead manager is a recursive component that considers all the forced move sequences in any order and all forced moves the opponent will have to make before his free move. However, it does not consider the free move of the opponent nor the tiles that might be drawn from the bag to replace the played ones. Furthermore, there is an absolute limit to the depth of the search of five moves. Finally, the lookahead manager does not use alpha-beta.
2.2
Human strategy
Since Tantrix appeared in its present form in 1992, a lot of games have been played and a lot of strategies to win the game have been learned. Every month, more than 50, 000 games are played online. Some people have shared their experiences with the public. Information about this can be found in the game booklet [1] on Tantrix.com [7], on D. Dyer’s homepage [8] and on the tournaments page [9]. The strategy advises can be divided into three parts. The first part is the startgame, which considers the first twenty moves. Then, we continue with the midgame until the bag is empty (another 20 moves). When the bag is empty, we arrive at the last stage of the game, the endgame (the last 12 moves). This subdivision only means that the strategies dealt with are very common in that specific phase of the game. However, any strategy can be used at any moment during the game.
2.2.1
Startgame
• A first strategy which particularly might be exploited at the beginning of a game, is to maximise the indirect line length by creating gaps and links which are likely to be filled or connected in the course of the game. Playing an indirect link adds two points to your indirect line length, while a direct link only adds one point. This is illustrated in figure 2.1. • As mentioned before, there are only forty-two tiles of a certain colour. Therefore, a second strategy is to try to waste tiles of your opponent, by forcing him into a small loop. This is illustrated in figure 2.2. This small loop cannot be extended anymore. In this way, the four tiles are wasted and will not be part of the longest line. This can also be done in a more subtle way as shown in figure 2.3. The tile played at position A forces the tile played at position B there, which leads to a small loop of
Chapter 2: Previous work
12
seven tiles. Five up to seven tiles are wasted in this construction. Loops up to seven, eight or nine tiles - which are awarded 14, 16 or 18 points respectively - are often too small to win the game (see figure 2.4). As the average winning score in Tantrix is around 21 points, it is certainly worth a try to close a line of twelve or more tiles into a loop. Even for experienced players this is quite difficult, but threatening your opponent by creating a (possible) long loop can already waste some of his moves as he will try to defend.
2.2.2
Midgame
• As only the longest line is taken into account when the results are calculated, it is important to create one long line. The further you are in the game, the more difficult it is to connect two separate lines. In figure 2.5 the red lines are isolated by playing the tile in position A. • When the end of the game is near, some tiles which are still left in the bag can become very crucial to win the game. It is very important to have these tiles in mind. A forced space can be filled by six tiles, unless it is surrounded by three different colours. In the latter case it can only be filled by five tiles. In figure 2.6 you can see a forced space at position A. There are six tiles that fit in that space of which four have already been played on the board. So there are still two tiles left that fit in this forced space. Towards the end of the game, it is thus strongly recommended in each move to consider carefully either to extend your line or to fill indirect connections. On the other hand, as you have to fill all the forced spaces you can, you can often extend your line with multiple tiles in one turn. • Creating a forced space or a space in a controlled side in such a way that it cannot immediately be filled, is another strategy called blocking. A distinction can be made between a temporarily and a permanent block. A permanent block arises when there is no tile left that fits in a forced space. This strategy is illustrated in 2.7. If there is no tile available on the racks of the players that fits in the space at A, but some tiles left in the bag do fit, we have a so-called temporary block. As seen in figure 2.8, a line can be temporary blocked on both ends with one tile if both the ends come out of the Tantrix on the same side. Furthermore, the tiles that fit in the forced space can be reduced by creating pairs of the same colour on the two sides of a forced space. This is illustrated in 2.9. In this figure, a crucial tile for red is missing. Due to the two green pairs on both sides of the forced space, it can only be filled by one tile as long as there are tiles left in the bag. This significantly reduces the chance that the gap will be filled. Finally, it is also possible to end up in a game situation where no tile can be played anymore. In such a case, the game is completely blocked. This situation only happens in one out of 10, 000 games.
Chapter 2: Previous work
Figure 2.1
13
An indirect red line with a length of five tiles (from Tantrix.com [7])
Figure 2.2
A small indirect red loop of four tiles (from Tantrix.com [7])
Figure 2.3
A small indirect red loop of seven tiles (from Tantrix.com [7])
Figure 2.4
A red loop of 9 tiles (from Tantrix.com [7])
Chapter 2: Previous work
14
• If the opponent still needs a crucial tile at a forced space, you can try to create another forced space on the board where that crucial tile also fits in. You might be lucky to draw that crucial tile from the bag and to be forced to play the tile in one of the forced spaces. As there is another similar forced space on the board, you can safely play it away in that position. You can manipulate your fortune a little bit by playing as many tiles in one turn as you can. This can be done by creating a controlled side where you can consecutively fill several adjacent forced spaces. The strategy to create another forced space where a crucial tile also fits in, is called a lookalike. An example of a lookalike space is shown in figure 2.10. Note that, although both forced spaces have the successive colours yellow, red and blue clockwise, this is not strictly necessary. You could also create a forced space with the successive colours of the other side of the crucial tile. Counting the remaining tiles and making good lookalike spaces is typically done by masterplayers.
2.2.3
Endgame
The endgame is very different from the rest of the game. As there are not tiles left in the bag, the game is completely deterministic. Except for the golden rule, the other restrictions are lifted. In the endgame it is important to be the first one to play a free move after the bag is empty. As the three restriction rules are no longer applicable with no tiles left in the bag, you can now, firstly, create a forced space with three links of the same colour, secondly, surround a forced space by four or more tiles and, thirdly, place tiles in a controlled side. As controlled sides do not exist anymore, you can connect an indirect link or prevent the opponent to close his indirect link at this stage in the game. An example is shown in figure 2.11. Red is permanently blocked in B, because there does not exist a tile that can fill a forced space which requires three times the same colour. You should also try to enter the endgame phase without a blocked longest line. If one end has already been blocked before, the opponent only has to block the other side of the line.
2.2.4
Global strategies
In the previous section, some specific strategies that can be used for specific game situations were explained. In this section, some more general strategies are explained. • Always try to waste tiles of the opponent. Often, it is possible to combine line extension with wasting tiles with the colour of the opponent. • Often when you try to close a loop, the loop is blocked on one side of the Tantrix . You can start a new line next to an endpoint of the loop. There are two options then. If you are lucky, the loop will be closed, but otherwise, it will probably be extended and connected with the nearby started line.
Chapter 2: Previous work
15
Figure 2.5
Figure 2.6
Two isolated red lines (from Tantrix.com [7])
There are only two tiles left that fit in space A (from Tantrix.com [7])
Figure 2.7
The red line is blocked in A (from Tantrix.com [7])
Chapter 2: Previous work
Figure 2.8
Figure 2.9
16
The yellow line is blocked by one tile
A pair block (from cheekymbk-17-GoodBot-24-2005-04-21-1709)
Chapter 2: Previous work
17
Figure 2.10
Figure 2.11
A lookalike (from Tantrix.com [7])
Red is permanently blocked in B (from Tantrix.com [7])
Chapter 3 Algorithms 3.1 3.1.1
Search algorithms The Tantrix game tree
A possible way to reason about a game is to use a game tree (Winston [11]). In a game tree, nodes denote board configurations and branches indicate how one game situation can be transformed into another by a single move. A game situation is more than a board situation. It also includes whose turn it is and which tiles each player has on his rack. The branching is the average of all the number of possible moves in every game situation. In games played by two players, there are two kinds of nodes: maximizing nodes and minimizing nodes. When you play against an opponent, the maximizing nodes are the positions in which you have to play, the minimizing nodes are the positions in which the opponent plays. For this reason, the player who is on turn in the maximizing nodes is called the maximizer, the opponent is called the minimizer. If both players play optimally, then one player always picks the most promising move in a maxnode, while the opponent does the opposite. The structure of the Tantrix game tree follows from the rules of the game described in section 1.2. A turn consists of three parts. Firstly, a player has to fill the forced spaces he can, secondly he can play his free move and finally, the player once again has to fill all the possible forced spaces. Typically there is much choice to play a free move, but only little choice to play a forced space. An example of the Tantrix game tree is given in 3.1. Here, we can see that these special game rules also lead to a special tree. A turn typically consists of multiple moves, so it is possible to have multiple min- or maxnodes after each other. This leads to a diamond structure of groups of min- or maxnodes following each other. For most games, exhaustive search in the game tree is impossible. This also holds for Tantrix . In Tantrix , approximately one out of three moves is a free move and two out 18
Chapter 3: Algorithms
19
MAX
MAX MAX
MAX
MAX
...
...
...
MAX
MAX
MAX
MAX
...
...
MAX
...
MAX
MAX
MIN
MIN
MIN
...
...
MIN
MIN
MIN
...
...
...
...
MIN
MAX
MAX
MAX
MAX
MAX
...
...
MAX
MAX
MAX
...
...
...
Figure 3.1
MAX
...
...
The game tree of Tantrix
MAX
MAX
MAX
...
Chapter 3: Algorithms
20
of three moves is a forced move. In chapter 5, we found that the average branching in midgame is around 46 for free moves. The branching for forced moves is on average 2. Supposing that the moves are always an iteration of a free move followed by two forced move and that the branching for a free move in endgame is 70, we find that there are approximately (46 × 2 × 2)15 × (70 × 2 × 2)3 × 70 × 2 ≈ 3 × 1043 leaves at the end of the game tree. This amount is too huge for an exhaustive search.
3.1.2
Static evaluation
Another approach, if only there was some way to rank all the game situations, would simply be to play the move that leads to the best situation that can be reached by one move. Unfortunately, no such ranking procedure exists. When one board situation is enormously superior to another, a simple measure, such as line counting, is likely to reflect that superiority, but if only this approach is used, this would lead to poor results.
3.1.3
Minimax
If a computer wants to play the best move, he has to find the optimal path in the game tree. The optimal path is the path that leads to the leaf in the tree, where the computer wins with the highest possible difference, assuming that the opponent always picks the move leading to the worst position for the computer. If victory is not possible anymore with these assumptions, the optimal path leads to the leaf where the computer plays draw or loses with the smallest amount of points. Searching the whole tree is infeasible, but if search terminates at some reasonable depth, the leaf situations may be compared, yielding a basis for move selection. Of course, the underlying assumptions of this approach are that the merit of a move is clarified as the move is pursued and that the search goes deep enough so that even rough board evaluation is satisfactory. Suppose we have a situation analyzer that converts all judgements about a game position into a single, overall quality number called the static evaluation score. By convention, if this quality number is positive, it indicates advantage for the computer. Negative numbers indicate advantage for the opponent. The degree of advantage increases with the absolute value of the number. The procedure that calculates this number is called the heuristic function. The calculation of this number is covered in section 3.3.
Chapter 3: Algorithms
21
Algorithm 3.1.1: MiniMax(depth) if limit of search has been reached then return (the static value of the current game position) if this is(a maximizing level then use MiniMax(depth+1) on all the children return (The maximum of all the results) else ( else use MiniMax(depth+1) on all the children return (The minimum of all the results)
A first assumption is that the opponent will always choose the best move for himself, which is the worst move for the computer. Secondly, the computer will always choose the best move for him. We can search in the tree for the best move, with the minimax algorithm. The minimax is a depth-first algorithm, which analyses the effect of a move in the future. If this recursive algorithm, described in 3.1.1, encounters a leaf, then the static evaluation score of the game situation at that point is returned. Otherwise, on the one hand, if the node is a maxnode, minimax on all the children of the current position is used and the maximum value is returned. On the other hand, if it is a minnode, the minimum of the children is returned.
3.1.4
Minimax adapted to Tantrix
Algorithms 3.1.2 and 3.1.3 illustrate minimax for Tantrix as it was used in a first implementation. The Java source of this particular piece of code can be found in appendix A.1. Algorithm 3.1.2 does some initialisation, like setting the standard depth and the maximum depth when forced moves are involved. It also detects if there actually is more than one possibility to play a move. If this is not the case, then that move is played immediately. This search algorithm does not take into account which tiles might replace the tiles that were played on the board. As all the tiles in Tantrix are different, the branching of the game tree would become huge. For the first move, for example, there are always six possibilities. If we took into account the possibilities for the tiles that can replace the first played tile, this would lead to a branching of 264 for the first move, because there are 44 tiles left in the bag and there are 6 tiles that can be played from the rack. Neither do humans consider which tiles can enter the startgame when searching a good move. However, they do consider the tiles when evaluating a possible game position in midgame. For this reason, the heuristic function, as described in section 3.3, will take into account which tiles are left on the racks and in the bag.
Chapter 3: Algorithms
22
Heuristic Continuation: event horizon Early chess programs that searched all paths in the same depth, often seized low-valued pieces along paths that led to the loss of high-valued pieces further on in the game. They acted so, because the loss of the heigh-valued pieces occurred at a higher depth than they searched. The capture of the piece was beyond the horizon. If the basic minimax algorithm was used for Tantrix , the computer could have a false sense of euphoria if his analysis stops just before the opponent permanently blocks the computer’s line. Similarly, he can have a false sense of depression, if his analysis stops just before he has to play a forced move, which will connect his two longest lines. To avoid this horizon effect, the minimax algorithm used for Tantrix searches deeper when forced moves are involved in a particular path. The depth of the search In Tantrix , approximately two-thirds of the moves are forced moves. As we do not consider the tiles that might come in from the bag, it is not helpful to search deeper than three turns. Due to new tiles that appear on the racks of the players, the game situation changes too much. If the search went deeper, this would lead to overestimated or underestimated moves. Another tile might enter the game and the opponent can divert your line by playing a new tile that, for instance, also fits in the created forced space. In the first 30 moves, Tantrix players call this a lucky draw. Another reason to limit the depth of the search to three, is the branching. In midgame, the branching is around 46. Searching one move deeper would take 46 times as long with minimax.
Chapter 3: Algorithms
23
Description of the Tantrix minimax algorithm Algorithm 3.1.2: Initialise MiniMax() global maxdepth, maxf orceddepth, currentplayer, playedf reemove local bestmove, score, bestscore currentplayer ← computer maxdepth ← 3 maxf orceddepth ← 8 if We have already played our free move then playedf reemove ← true else playedf reemove ← false bestscore ← −∞ if A forced move must be played if the number of possibilities = 1 then return (that move) elsefor each possible forced move play the move then score ← MiniMax(1) do undo the move if score > bestscore then bestmove ← move playedf reemove ← true for each possible free move play the move else score ← MiniMax(1) do undo the move if score > bestscore then bestmove ← move return (bestmove)
A first method, shown in 3.1.2, initialises some variables before calling the minimax function. It sets both maxdepth and maxforceddepth. The variable maxdepth denotes for how many moves in depth the algorithm will investigate the game positions, when no forced moves are involved. The other variable maxforceddepth contains the limit for the depth when forced moves are involved. The algorithm also reflects the three states of a turn in Tantrix . Firstly, one has to fill all the forced spaces that can be filled, then one has to play a free move and finally, a player once again has to fill all the possible forced spaces. The variable playedfreemove distinguishes between the first and second stage where one has to fill the forced spaces. If this variable is true, then the free move has already been played.
Chapter 3: Algorithms
24
It is also possible that there is only one possible move. This is the case when there is one forced space on the board in which only one tile from a player’s rack fits. In that case, the player has to play that particular tile. This function detects such a situation and immediately returns the move without wasting any time on it. Algorithm 3.1.3 shows how minnodes are processed. If the maximum depth is reached, the static evaluation of that game position is returned. Otherwise, the algorithm distinguishes between the three states of a turn in Tantrix . If a forced move has to be played in this node, the variable maxdepth is increased. This garantees that the search will continue at least one level further. If the free move has been played and there are no forced moves left, then another similar algorithm is called to process the maxnode. Complexity of minimax search If the average branching of the game tree is b and the search depth is d, the complexity is O(bd )
Chapter 3: Algorithms
Algorithm 3.1.3: EvaluateMin(depth) if depth = maxdepth or depth = maxf orceddepth then return (static evaluation of this position) local resultscore, score resultscore ← ∞ if not playedf reemove if a forced move must be played for each possible forced move play the move maxdepth + + score = EvaluateMin(depth + 1) then do maxdepth − − undo the move if score < resultscore then resultscore = score then return (resultscore) playedf reemove ← true for each possible free move play the move score = EvaluateMin(depth + 1) else undo the move do else if score < resultscore then resultscore = score return (resultscore) if a forced move must be played for each possible forced move play the move maxdepth + + score = EvaluateMin(depth + 1) then do maxdepth − − undo the move else if score < resultscore then resultscore = score return (resultscore) switch to computerplayer resultscore ← EvaluateMax(depth) else switch back to opponent return (resultscore)
25
Chapter 3: Algorithms
3.1.5
26
Alphabeta search
During midgame, the branching in the game tree is around 45 (see chapter 5). This means that, if we search at depth three, 453 = 91125 game situations need to be evaluated by the heuristic function. As we use heuristic continuation, this number will even be bigger. Fortunately, it is not necessary to evaluate every leaf. There is a procedure that reduces both the number of tree branches that must be generated and the number of static evaluations that must be done, thus cutting down the work to be done overall. This procedure is called alphabeta search. The alphabeta principle dictates that, whenever you discover a fact about a given node, you should check what you know about ancestor nodes. This idea is illustrated in figure 3.2. While searching in the tree, you always look for a move with a score between alpha and beta. Some moves are garanteed to be worse than the best move found so far. On the other hand, some moves might lead to such a good position that the opponent will always pick another move in a minnode. If such nodes are found, then they can be pruned. An illustration of this algorithm is given in figure 3.3. Pseudocode is shown in algorithm 3.1.4. The maximizing nodes are indicated with a rectangle, the minimizing nodes with a circle. In the figure, the alphabeta is called for a depth of four. At the beginning, alphabeta should always be called with a value of −∞ for α and ∞ for β. Alphabeta starts depth-first by expanding node B and then node D. In H, the limit of the search is reached and the static value of the position is returned. In this particular case, the value is assumed to be 6. As D is a maximizing node, α is set to 6 there. Next, I is evaluated, but α is not changed as 5 < 6. All the children of D has been investigated, so α for D is returned and in node B this value is set to β. Then, alphabeta is called for node E, with the values for α and β from B. In E, alphabeta is called for the children. In J, the static value for the board is 8. This is returned and set in E to the value of α. Now, we see that in E, α ≥ β, so we can return α. The node K will never be investigated. This is no problem as the opponent would never choose for E, because there is certainly a better move, namely the move that leads from B towards D. In E, the value of α is returned, but, as 8 > 6, β for node B remains 6. In B, β is returned and set in A to α because 6 > −∞. The values of α and β are passed to C and then to F. To evaluate F, L and M have to be evaluated first. L has the greatest value and this is returned and set to the β value of C. This example illustrates that, with the aid of alphabeta, some nodes can be safely skipped, without affecting the result.
Chapter 3: Algorithms
27
Figure 3.2
Figure 3.3
The idea of alphabeta (from Milch [13])
A game tree processed by the alphabeta algorithm (from Milch [13])
Chapter 3: Algorithms
28
Algorithm 3.1.4: AlphaBeta(a, b, depth) if limit of search has been reached then return (the static value of the current game position) local val, alpha, beta alpha ← −∞ beta ← +∞ if this isa maximizing level for each Child val ← AlphaBeta(max(a, alpha), b, depth + 1) if val > alpha then alpha ← val then do if alpha ≥ b then return (alpha) else return (alpha) for each Child val ← AlphaBeta(a, min(b, beta), depth + 1) if val < beta then beta ← val do else if a ≥ beta then return (beta) return (beta)
3.1.6
Alphabeta adapted to Tantrix
In a similar way as minimax, alphabeta was adapted to Tantrix as well. The Java algorithms are given in appendix A.2. This algorithm does not take into account the tiles from the bag that might enter the game and searches at the same depth as the minimax algorithm for Tantrix described in section 3.1.3. The initialisation procedure also sets the variables maxdepth and maxforceddepth and makes sure that no time is wasted on forced moves when there is only one possible move. Algorithm 3.1.5 shows how a max node is evaluated using alphabeta search. This procedure is called from the initialisation with a value which is equal to the best score found so far and ∞ for b.
Chapter 3: Algorithms
Algorithm 3.1.5: MaxAB(depth, a, b) if depth = maxdepth or depth = maxf orceddepth then return (static evaluation of this position) local val, alpha, beta alpha ← −∞ beta ← +∞ if not playedf reemove if a forced move must be played for each possible forced move play the move maxdepth + + val = MaxAB(depth + 1, max(a, alpha), b) maxdepth − − then do undo the move alpha = max(val, alpha) if alpha ≥ b then return (alpha) then return (alpha) playedf reemove ← true for each possible free move play the move val = MaxAB(depth + 1, max(a, alpha), b) undo the move else else do alpha = max(val, alpha) if alpha ≥ b then return (alpha) return (alpha) if a forced move must be played for each possible forced move play the move maxdepth + + val = MaxAB(depth + 1, max(a, alpha), b) maxdepth − − then do undo the move alpha = max(val, alpha) else if alpha ≥ b then return (alpha) return (alpha) switch to computerplayer val ← MinAB(depth, a, b) else switch back to opponent return (val)
29
Chapter 3: Algorithms
3.2
30
Problems with the search algorithms
As explained in Rich and Knight [14], minimax and alphabeta rely heavily on the assumption that the opponent always chooses the optimal move. This assumption is acceptable in winning situations in which a move guaranteed to be good for the computer can be found. But, in a losing situation, it might be better to take the risk that the opponent will make a mistake. Suppose we must choose between two moves, both of which, if the opponent plays perfectly, lead to situations that are very bad for the computer, but one is slightly less bad than the other. Furthermore, suppose that the least promising move could lead to a very good situation for the computer if the opponent makes a single mistake. We should choose this one, which is probably slightly worse, but possibly a lot better when the mistake is made. The alphabeta procedure, however, would choose the guaranteed bad move. A similar situation arises when one move appears to be only slightly more advantageous than another, assuming that the opponent plays perfectly. It might be better to choose the less advantageous move if it could lead to a significantly superior situation if the opponent makes a mistake. To make these decisions, we must have access to a model of the individual opponent’s playing style so that the likelihood of various mistakes can be estimated. This is very hard to provide.
3.3
Heuristics
Both the minimax and alphabeta search algorithms need to evaluate the game situation at the leaves of the tree. This section describes a way to generate an overall quality number for a certain game position in Tantrix . With the human strategies, covered in section 2.2, in mind, a global heuristic function as a combination of different heuristic functions for Tantrix is explained here.
Chapter 3: Algorithms
3.3.1
31
Real line length
As the longest line counts in Tantrix, a trivial heuristic is to count the difference between the length of the longest lines of the two players. In figure 3.4, a game situation is given in which yellow has a real line of 11 tiles in length, while blue has a line of only 7 tiles long. In this game situation, yellow is on the winning hand. If we want to express the advantage for yellow with a heuristic, we can take the length of the longest line of yellow, which is 11, and substract the length of the longest line of blue, which is 7. This results in a real line length heuristic of 4. Algorithm 3.3.1 shows how this can be calculated. Algorithm 3.3.1: RealLineLengthHeuristic(gameposition) comment: Search for the lines first global set of the lines for each Player of the game Initialise a table with the spaces of the board for each empty space on the outline of the Tantrix for each of the six directions if a newline of the colour of the player starts Follow the line do do Mark every visited space in the table then Add the line to the set do comment: now search for loops for each Unvisited place on the board do if There is a tile with a colour of the player Follow the loop Mark every visited space in the table then
Add the loop to the set
Chapter 3: Algorithms
3.3.2
32
Virtual line length
If we only take the longest line into account, this is certainly not enough, especially not in the startgame (see section 2.2.1). In the startgame it is important to play indirect links, as the gaps will most likely be filled later in the game. As the gaps have not been filled yet, they do not deserve the full number of points. An example of such a line is given in 3.5. In this figure blue leads in real points, but the situation is nevertheless better for yellow. Algorithm 3.3.2 shows a way to calculate this. Algorithm 3.3.2: VirtualLineLengthHeuristic(gameposition) comment: Search for the lines first global set of the lines for each Player of the game Initialise a table with the spaces of the board for each empty space on the outline of the Tantrix for each of the six directions if a newline of the colour of the player starts Follow the line and count the gaps Mark every visited space in the table do do Set the length of the line to the length in tiles then + 0.7× #gaps Add the line to the set do comment: now search for loops for each unvisited place on the board do if There is a tile with a colour of the player Follow the loop Mark every visited space in the table Set the length of the loop to the length in tiles then + 1.4× #gaps Add the loop to the set
Chapter 3: Algorithms
33
Figure 3.4
Figure 3.5
Real line length yellow vs blue: 11-7
Virtual line length yellow vs blue: 8.7 - 6
Chapter 3: Algorithms
3.3.3
34
Fair share line length
In figure 3.6, the real score is equal, but here the situation is much better for green than for yellow. Yellow has been blocked on both sides and cannot extend his line, while the line of green is completely free. To take such a situation into account, the fair share line length has been introduced. A fair share of the remaining tiles from the bag will be added to the longest line. This fair share depends on how many tiles of a particular colour are left in the bag. It also depends on how many tiles are left that fit the end points of the line. Algorithms 3.3.3 and 3.3.4 suggest a way to calculate this. If we apply this to figure 3.6, the following results are achieved. • There are 26 tiles on the board, so 30 tiles still need to be played. • There are 18 tiles left in the bag. •
– Yellow has a virtual line length of 9. – There are 22 tiles carrying yellow on the board, 20 tiles with yellow are left. – As there are still 30 tiles left,
2 3
of the remaining tiles carry yellow.
– At each end point of the line, there are two tiles left in the bag which fit in the each of these spaces. × 23 ) + 5 − ( 18 × 23 ) = 11 – The fair share line length for yellow is 9 + 5 − ( 18 3 3 •
– Green has a virtual line length of 10.7. – Green has 18 tiles of his colour on the board. There are 24 green tiles left. – One tile of the remaining tiles is needed to fill the gap. – The fair share line length for green is 10.7 +
24−1 2
= 22.2.
Chapter 3: Algorithms
Algorithm 3.3.3: FairShareLineLengthHeuristic(gameposition) comment: This method assumes that a set of virtual lines is given global set of the lines local f sscore for each Player of the game N T ← # remaining tiles of his colour T B ← # remain in the tilebag RT ← # tiles that still need to be played for each line of the player do if The line is a loop then f sscore ← Twice the virtual line length do f sscore ← Virtual line length N T ← N T − # gaps for each end point of the line else do f sscore ← f sscore + FairSharePartFor(endpoint) Set the fair share line length to fsscore N T ← N T + # gaps
35
Chapter 3: Algorithms
36
Algorithm 3.3.4: FairSharePartFor(endpoint) local nbF ittingT iles, d if The endpoint is forced or free if a tile from a rack fits then f sscore ← f sscore + N T /4
then
nbF ittingT iles ← # tiles from the bag that fit if nbF ittingT iles > 0 then f sscore ←
f sscore + N T /4 − T B/(nbF ittingT iles + 1) × N T /RT else nbColourM atchingT iles ← # matching tiles left f sscore ← else
f sscore + min(3, nbColourM atchingT iles)
else
d ← distance to the originating forced space nbF ittingT iles ← # tiles from the bag that fit there if nbF ittingT iles > 0 then f sscore ←
f sscore + N T /4 − T B/(nbF ittingT iles + 1D) × (N T /RT ) − d
nbColourM atchingT iles ← # tiles left matching the originating forced space else f sscore ←
f sscore + min(3, nbColourM atchingT iles)
Chapter 3: Algorithms
3.3.4
37
Iterative filling of interesting places score
As shown in figure 3.7, it is interesting to know in which ways blocked games can end up. Here, red has a real line of 13 tiles, but depending on the tiles that will fill the orange dots, red might threaten with a huge loop. It is also possible for red to see how yellow can evolve and make a lookalike space if necessary to prevent yellow from closing a loop. Algorithm 3.3.5: FillUpHeuristic(game) procedure Rearrange(assignment) if no tile is assigned twice (
f illedupscore = n←n+1
then
(f illedupscore×n)+VirtualLineLength(assignment) n+1
Pick a multiple assigned tile for each assignment to a different space if A filled forced space becomes empty else then clear the induced forced spaces do Rearrange(Assignment)
Restore the situation
return main global f illedupscore, n comment: Assigning one tile to multiple spaces is allowed for each possible assignment of tiles do Rearrange(assignment)
3.3.5
Combination of the heuristics
The search algorithms, described in section 3.1, require one global number as an estimate for the current game position. To retrieve one global number about the current game situation, the results of the different heuristics described above need to be combined. The way to combine these results was done by trial and error. The combination takes place at two levels. First, a distinction is made between startgame and endgame score. In the startgame, virtual line lenght is most important, while in the endgame real line length is most important. During the game, open end points are constantly important. For this reason, a linear smoothing from startgame to endgame score was introduced. In the first 35 moves, the startgame score is returned. In the last 12 moves, the endgame score is
Chapter 3: Algorithms
38
Figure 3.6
Fair share line length yellow vs green: 11 - 22.2
Figure 3.7
The orange dots are interesting for red
Chapter 3: Algorithms
39
returned. For the 11 moves in-between, a linear combination of startgame and endgame score is made. Pseudocode for combination is given in algorithms 3.3.6, 3.3.7, 3.3.8.
Algorithm 3.3.6: GetStartGameScore(game) local maxRobot, maxHuman, vll, f ss maxRobot ← −∞ maxOther ← −∞ for robot and human if There is no line yet then Set the score to 21
for each line vll ← the virtual length of the line f ss ← the fair share score of the line do if the line is a small loop else do then vll ← −10 × vll if calculations are for robot then maxRobot ← max(maxRobot, 6 × vll + f ss) else maxOther ← max(maxOther, 6 × vll + f ss)
Algorithm 3.3.7: GetEndGameScore(game) local maxRobot, maxHuman, vll, f ss maxRobot ← −∞ maxOther ← −∞ for robot and human if There is no line yet then Set the score to 21
for each line vll ← the virtual length of the line f ss ← the fair share score of the line do if the line is a small loop else then vll ← −10 × vll do if calculations are for robot then maxRobot ← max(maxRobot, 6 × vll + f ss) else maxOther ← max(maxOther, 6 × vll + f ss)
Chapter 3: Algorithms
40
Algorithm 3.3.8: GetGlobalScoreFor(game) global nbT ilesnbT iles ← # tiles played on the board if nbT iles ≤ 35 VirtualLineLengthHeuristic(game) then FairShareHeuristic(game) return (GetStartGameScore(game)) RealLineLengthHeuristic(game) VirtualLineLengthHeuristic(game) FairShareHeuristic(game) if nbT iles ≥ 44 then return (GetEndGameScore(game)) 44−nbT iles × GetStartGameScore(game) + (nbT iles−35) × GetEndGameScore(game)) return ( 11 11
3.4
The problem with genetic algorithms
The algorithms described above, contain a lot of parameters which were tuned by trial and error. In Mitchell [18], an approach to learning based on simulating evolution is explained. This approach is called genetic algorithms. It is possible to determine all the parameters that were introduced by trial and error above. These parameters are responsible for the behaviour of the robot. One could create a population of robots with different parameters. Then, we let the population evolve. Members give rise to the next generation population by means of operations such as random mutation and crossover, which are patterned after processes in biological evolution. At each step, we measure the fitness of the robots. This can be done by organizing a tournament for all of them. The fittest robots are selected as seeds for producing the next generation. The tournament between Oliver, the robot of this project, and GoodBot, from Dave Dyer, consists of 200 games. On a modern computer, this will take several hours to complete. Genetic algorithms is considered to be possible, but it would probably take several weeks to achieve results on a single computer.
3.5
A distributed approach
So far, we have assumed that no tiles entered the game while playing. This is not a huge problem. Neither do humans think about tiles that can come in from the bag during startgame. They start considering this in the midgame and especially in the last ten moves before the endgame. For the last 4 tiles, there are 4! or 24 possible sequences of tiles. With
Chapter 3: Algorithms
41
this approach, described in more detail in section 4.4 we can evaluate all these possibilities and a representative set of the possibilities a little earlier in the game as well.
Chapter 4 Design and implementation 4.1
Interface details
For several years, it has been possible to play Tantrix on Tantrix.com [19]. The main goal of this thesis was to implement and add a robot to the server running on Tantrix.com. The result of the implementation has been running on Tantrix.com since May 2005 and is called Oliver. Three possible ways to add a new robot were suggested when I started this project. Firstly, a robot could be made that works entirely by looking at the communications that the normal java applet has with the server. When doing this, the robot would be one of the ordinary users in the lobby. Secondly, an interface could be implemented that returns a move for a given game situation. This means that all the logic of a robot needs to be implemented together with a representation for the game state. Finally, it was also possible to write a new evaluator for the existing robot, called Robot. We have chosen the second option. Dave Dyer provided a robot kit. This kit is described on Tantrix.com [22] and contains documentation as well. This option allows to make a complete unassisted robot, while ignoring details of integration in the framework on Tantrix.com. The kit also provides software to visualise the games. The kit contains a stub, called twister/DummyBot.java, shown in listing 4.1. This stub is called from the Tantrix game client applet running in ExpBot.html, a webpage containing the applet. InitRobot is called to initialise the robot with a given game. The methods DoPlayerQuit, Quit and StopRobot were implemented to stop the robot without returning any results. When a game is finished, DoGameOver is called. In this method, other things can be called, for example, for some bookkeeping. DoTurnStep is invoked when a robot move is required. The commonGame received by initialisation contains enough information to determine the state of the game. Besides the interface, other delivered binaries provide network connection to a test server, running of the other robots and also the features provided on Tantrix.com like reviewing games. It is also possible to invite guests to the
42
Chapter 4: Design and implementation
43
server and to play games against another existing robot in batch. package t w i s t e r ; import o n l i n e . t a n t r i x . common . ∗ ; import o n l i n e . common . ∗ ; import j a v a . awt . ∗ ; public c l a s s DummyBot implements R o b o t P r o t o c o l { commonGame game = n u l l ; /∗ ∗ i n i t i a l i z e t h e r o b o t , b u t don ’ t run y e t ∗/ public void I n i t R o b o t (commonGame g , S t r i n g e v a l u a t o r , i n t s t r a t e g y ) { game = g ; //we ’ l l need t h i s System . out . p r i n t l n ( ” I n i t from ” + g + ” with ” + e v a l u a t o r + ” : ” + s t r a t e g y ) ; } public void S t a r t R o b o t ( boolean c o n t i n u o u s ) { }; public void StopRobot ( ) { } public boolean Running ( ) { return ( f a l s e ) ; } public boolean RobotPlayerEvent ( i n t x , i n t y , R e c t a n g l e s t a r t r e c t ) { return ( f a l s e ) ; }; public void DoPlayerQuit ( ) { }; public void DoGameOver ( ) { }; public void Quit ( ) { }; public void DoTurnStep ( commonPlayer p ) { R o o t A p p l e t P r o t o c o l r o o t = game . theRoot ; // t h i s shows how t o g e t p a r a m e t e r s from t h e l a u n c h i n g html f i l e System . out . p r i n t l n ( ” Parameter : j d k ” + r o o t . g e t ( ” j d k ” ) ) ; System . out . p r i n t l n ( ”Now Make a Move f o r c o l o r ” + p . c o l o r + ” rack p o s i t i o n ” + p . index ) ; System . out . p r i n t l n ( game . b . rackSummary ( ) ) ; System . out . p r i n t l n ( game . b . boardSummary ( ) ) ; System . out . p r i n t l n ( game . b . moveSummary ( ) ) ; // game . b i s t h e GameBoard // game . p l a y e r s [ ] i s t h e l i s t
o f commonPlayer .
// t h i s i s a handy t r i c k t o l e t t h e r o b o t run u n t i l a c e r t a i n p o i n t // i f ( board . t i l e s O n B o a r d>=s t o p a t n t i l e s ) { c o n t i n u o u s=f a l s e ; } // t o make a move // game . commitToTile ( x , y , t i d x , o r i e n t ) ; }; }
Listing 4.1
The Tantrix.com interface
Chapter 4: Design and implementation
4.1.1
44
Details of the game status representation
The gameboard, denoted by game.b has four methods which are sufficient to determine the state of the game. A first method, called tileSet returns a string with all the Tantrix tiles. The string will probably never change and contains: 1=GGRRYY 2=RRGGYY 3=GYRRYG 4=YGRRGY 5=YRGGRY 6=RYRGYG 7=RGRYGY 8=GRGYRY 9=YRGRGY 10=YGRGRY 11=GRYRYG 12=GYRYRG 13=RGYGYR 14=RYGYGR 15=GGBBYY 16=BBGGYY 17=GYBBYG 18=YGBBGY 19=YBGGBY 20=BYBGYG 21=BGBYGY 22=GBGYBY 23=YBGBGY 24=YGBGBY 25=GBYBYG 26=GYBYBG 27=BGYGYB 28=BYGYGB 29=RRBBYY 30=YYBBRR 31=RYBBYR 32=YRBBRY 33=YBRRBY 34=BYBRYR 35=BRBYRY 36=RBRYBY 37=YBRBRY 38=YRBRBY 39=RBYBYR 40=RYBYBR 41=BRYRYB 42=BYRYRB 43=RRBBGG 44=BBRRGG 45=RGBBGR 46=GRBBRG 47=GBRRBG 48=BGBRGR 49=BRBGRG 50=RBRGBG 51=GBRBRG 52=GRBRBG 53=RBGBGR 54=RGBGBR 55=BRGRGB 56=BGRGRB The tiles are numbered, but the numbering differs from the one used in the game pack. This numbering is also used in the smart game format, used to store online played games. For each tile, a string is given, containing the colours clockwise starting at the northeast side. A second method, is rackSummary and returns the current status of the board. For figure 4.1, this method returns: 0:G 47 25 5 3 46 19 1:B 36 31 6 11 40 48 This string consists of two parts. The first part is the rack for the robot in the stub. The other part is the one of the opponent. All racks are numbered. For each rack, a character denotes the colour of the player, followed by six numbers referring to tiles from the tileString method. If a place on the rack is empy, then −1 is written. Thirdly, boardSummary describes the current configuration of the board. For the game situation shown in figure 4.1, the output of this method is: 7@28,18 0 8@25,17 5 15@27,17 3 17@29,15 3 18@28,14 0 20@28,16 3 32@25,15 2 34@27,15 0 35@26,16 2 52@24,16 3 These are all blocks of the form ’t@x, y r’, in which t denotes the tile number as given in tileSummary, x and y are the coordinates of the board and r the rotation of the tile. The coordinate system is a grid, but only half of the positions are legal ones. To achieve a hexagonal grid, x and y must be both odd or even. Given the coordinates x and y of a space, you can go to the following adjacent spaces as described in table 4.1. The last character denotes the rotation of the tile in clockwise direction. Figure 4.2 illustrates how tile 49 looks after a rotation of 1. The value of t is always between 1 and 56, while the rotation r is always a number between 1 and 5. The last method, moveSummary can give this as output for figure 4.1:
Chapter 4: Design and implementation
45
Direction x-coordinate y-coordinate Northeast x + 1 y−1 East x+2 y Southeast x + 1 y+1 Southwest x − 1 y+1 West x−2 y Northwest x − 1 y−1 Table 4.1
Adjacent spaces in the Tantrix coordinate system
0 false The first part of the string denotes the number of the rack of the player, as described in rackSummary, which the robot provides a move for. The second part informs whether the free move has already been played or still has to be played.
4.1.2
Advantages and limitations of the interface
There were a lot of advantages to use the framework from Tantrix.com. First of all, design and implementation of a user interface to visualise the board with tiles and the racks of the players would be difficult to make. Furthermore, it allows to run the robot in the same environment where it has to run after the release. Finally, the kit also takes care of saving every game. Afterwards, any game can be reviewed with the reviewer at any moment. Tantrix is a game in which visualisation is very important. With the information provided by the framework as decribed in section 4.1.1, it is impossible for humans to understand the game situation without the use of real tiles or software to visualise it. In order to develop a high-end robot, there are some shortcomings in the interface. Firstly, it would be helpful to be able to let the robot evaluate moves suggested in a reviewer mode. It would be very interesting to know what score the robot assigns to the different moves. Furthermore, neither is it possible to replay a certain game from a certain move. This would be interesting to debug the program. Finally, it would also be nice if it were possible to visualise the tree. These features would increase the chances to develop a better robot. Future robots are likely to think when the opponent is on turn. In that particular case, a callback method would be needed to be informed about any change in the game state. In the current implementation, the representation of the game is rebuilt for every move to be made.
Chapter 4: Design and implementation
Figure 4.1
An example of a game situation (from Dave Dyer)
Figure 4.2
Tile 49 with rotation 0 and rotation 1
46
Chapter 4: Design and implementation
4.2
47
Design of Oliver
The design of Oliver was done accordingly to the principles described by Larman [20] and by Gamma, Helm, Johnson and Vlissides [21]. A simplified class diagram is given in figure 4.3. When the method DoTurnStep is called in Oliver, the instance of commonGame from the robot kit is translated to an instance of Game. No representation from the robot kit is shared, except for the four methods discussed in section 4.1.1. The output of these four methods is parsed. First, a tile bag is made which contains all the tiles. Then, a gameboard and two players are created. The tiles that need to be placed on the gameboard are retrieved from the tile bag. When the objects with the state of the game are built, a new thread is started in Oliver which calls an instance of BestMoveSearcher. This class implements the search algorithms as described in section 3.1. The game positions at the leaf of the tree are evaluated by an instance of Heuristics. This class implements the heuristic algorithms described in section 3.3. For each node in the tree, an instance of MoveGenerator is called. The move generator generates an array of all legal moves in that particular game situation. It does this by trying to fit each tile from the rack of a player in each possible rotation in all possible spaces. When a search algorithm searches the best move in the game tree, it may change the game. Algorithms are allowed to put tiles on the gameboard, remove tiles from the racks or change the game situation in any way. It is the responsibility of the algorithms themselves to undo the changes they have made. In this way, unnecessary cloning of objects is avoided, which saves memory.
4.3 4.3.1
Statistics of the game Storing information
When implementing or changing code in the core algorithms of the robot, it is often difficult to tell at first sight if any improvement has been made. Furthermore, it was also necessary to retrieve information about the structure and the branching of the game tree. Therefore, two additional singleton classes were introduced. These two classes were removed in the version that is playing in the lobby now. The class Statistics takes care about information and statistics during the game. It collects information about the branching and the number of game positions evaluated by the heuristic function, depending on whether or not the move is forced and which algorithm is used for a certain depth and maxforceddepth. The class ResultTable is called by Oliver when a game is finished. This table records the results of all the games. The results of these two singleton classes, are passed to a mysql database. This database contains four tables: free and forced branching, tree size and played games.
Chapter 4: Design and implementation
48
Game
Player
+COLORCHARS: char[] +COLORNAMES: String[] +COLORS: Color[] -$currentPlayer: int 1 -$freeMovePlayed: boolean -$gameBoard: GameBoard -$otherPlayer: Player -$players: Player[] -$playerSeq: Player[] -$robotPlayer: Player -$tileBag: TileBag 1
1
Oliver
2 -$color: Color -$colorChar: char -$colorNb: int -$hashedRack: Hashtable -$name: String -$score: int
1
0-6
Tile
1
+game: commonGame -ourGame: Game -t: Thread -bestMoveSearcher: BestMoveSearcher
-$hasBlue: boolean -$hasGreen: boolean -$hasRed: boolean -$hasYellow: boolean 0-56 -$keys: String[][] -$nr: int -$otherSides: int[] -$rotation: int -$seq: String -$sides: Color[]
GameBoard +DX: int[] +DY: int[] -$board: Tile[][] 1 -$controlledSpaces: int[][] -$forcedSpaces: int[][] -$freeSpaces: int[][] -$numberOfTiles: int -$outline: int[][]
+InitRobot(g:commonGame,evaluator:String, strategy:int): void +StartRobot(continuous:boolean): void +StopRobot(): void +Running(): boolean +RobotPlayerEvent(x:int,y:int,startrect:Rectangle): boolean +DoPlayerQuit(): void +DoGameOver(): void +Quit(): void +DoTurnStep(p:commonPlayer): void 1
0,1
1
0,1
0-56
TileBag 1
Move -$colorChar: char -$forced: boolean -$rotation: int -$score: double -$tileNr: int -$x: int -$y: int
+$nbTilesInBag: int +$secureRandom: SecureRandom +$tiles: Tile[] +$tilesByKey: HashTable
0,1
1
-MAXFORCEDDEPTH: int -PRINTTREE: boolean -$game: Game -$heuristic: Heuristic -$maxDepth: int -$moveGenerator: MoveGenerator -$statistics: Statistics -$interrupted: boolean
1
1
1
1
-_instance: ResultTable -$con: Connection +storeGameResult(player1:String,score1:int, Player2:String,score2:int)
1
+$lines: Vector[] +$virtualLines: Vector[]
1
+getScore(game:Game): double
+getBestMoveFor(game:Game): Move
ResultTable
Line
Heuristic
BestMoveSearcher
1
1
MoveGenerator +getForcedMoves(game:Game): Move[] +getFreeMoves(game:Game): Move[]
Statistics
-_instance: Statistics +$con: Connection +$nbOfEvaluatedGamePos: int +informEvaluation(): void +informPossibilities(movenb:int,possibilities:int, forced:boolean): void +informEndOfSearch(movenb:int,forced:boolean, maxdepth:int,maxforcedepth:int, algorithm:String): void
Figure 4.3
A class diagram of Oliver
*
-$xb: int -$yb: int -$bd: int -$xe: int -$ye: int -$ed: int -$color: Color -$rlscore: double -$gaps: Vector -$vlscore: double -$fsscore: double -$isLoop: boolean
Chapter 4: Design and implementation
49
Free and forced branching In these two tables information is recorded about the branching. There are two fields, i.e. minimum and maximum, that contain both minimum and maximum values for the branching that has ever been encountered. Two other fields contain the average and the number of times the table has been updated. With these two fields, a new average anew can be calculated from a new value v, the stored average a and the number of times the table was updated n by using this formula: anew =
v + (n × a) n+1
(4.1)
Tree size This table stores information about the number of evaluated game positions. This number depends on the move number, the standard depth, the maximum depth with heuristic continuation, the used algorithm and whether or not the move is forced. All these things are stored. When a combination of these circumstances occurs, the corresponding row is updated. Otherwise, a new row is inserted in the database. The stored information is again the minimum and maximum, together with the average calculated by equation 4.1 and the number of updates. Played games When a game has been played, the final result is stored in this table. The table has five columns. It contains the names of both players, the scores of both players and the date when the game was played.
4.3.2
Retrieving information
To retrieve information from the database some additional tools were written. CollectBranching and CollectNodeEvaluations are two programs that dump the contents of freebranching and treesize in a file with a space as delimiter. The most interesting tool is TournamentScore. This program must be called with the names of two players. This program searches in the database for all the games played between these two players. The program processes the end score of all these games and prints the winning percentage, tournament score and the total number of played games on the screen. The calculations of these scores is covered in section 5.1. It is possible with this program to determine how good the implementation is relative to GoodBot and Robot. Changes to the code are good if they give better results.
Chapter 4: Design and implementation
4.4 4.4.1
50
A distributed implementation ’Monte Carlo’ evaluation
The implementation, as described above, supposes that no new tiles come in from the bag. This assumption was made as it is infeasible to evaluate all possible tile replacements on a rack of a player. However, in a distributed system, tile replacement can be taken into account to some extent, depending on the number of computers that can be used. In this implementation, a central server randomly chooses, if possible, as many different sequences of tiles as the number of computers available. Then, each sequence is given to a computer together with the game situation. All the computers use the algorithms as described above. A timer is also set to five minutes. When a computer has given a score to all the possible moves at a certain game position, the scores are normalized. The normalisation takes care of a fair election. The best move receives one point and the worst move zero points. All the other moves receive a score between zero and one. For the last four moves, there are 4! = 24 possible different sequences for the tiles. This approach can consider all these 24 possibilities in a reasonable amount of time. Earlier in the game, there are many more possible sequences. If the maximum depth with heuristic continuation is set at 8, the number of possible sequences of 8 tiles from a bag with 10 tiles is: 10P 8 =
10! = 1, 814, 400 (10 − 8)!
As described in chapter 5, this implementation gives better results than the normal implementation.
4.4.2
An alternative approach
With the ‘Monte Carlo’ approach, the number of possible sequences is quite huge. A smaller and better set of tiles can be obtained by assuming that only one, two, three or four tiles enter the game, according to whether or not it is feasible at that stage in the game. When thinking at a depth of five or more, this approach assumes that no tiles come in anymore. In the beginning, during startgame, the algorithm can run and give each computer one different tile. When there are fewer tiles in the bag, it is possible to distribute every possible sequence of two tiles. Depending on the system that is used, this is possible when there are 10 tiles left in the bag. The number of possible ways to pick a sequence of two tiles from a bag containing 10 tiles is: 10P 2 =
10! = 90 8!
When there are six tiles left in the bag it becomes possible to provide the computers with
Chapter 4: Design and implementation
51
sequences of three tiles long. The number of possible ways to pick three tiles out of a group of six tiles is: 6P 3 =
6! = 120 3!
When there are five tiles left, all possible sequences can be evaluated as there are only 120 different sequences. This algorithm avoids the fault made in the normal and Monte Carlo approach. In the normal way, running on a single computer, it is assumed that no tiles come in from the bag. On the other hand, the Monte Carlo approach assumes that any tile can enter the game at any moment, which is not manageable. The approach from this section, however, takes the golden mean.
4.4.3
The difference with the ’iterative filling of interesting places’ heuristic
The approach of using the tiles from the bag while searching is no the same as what happens in the filled interesting places heuristic described in section 3.3.4. The heuristic calculates the long term effect of a certain move. On the other hand, the alternative approach takes care of the short term merit of a certain move. This evaluates for example what happens if a block could not be held as long as estimated and also sees the short-term danger and merit of certain (un)lucky tiles that can be drawn from the bag.
4.4.4
Implementation of the ‘Monte Carlo’ approach
The ‘Monte Carlo’ approach, as described in section 4.4.1 was implemented as shown in the class diagram in figure 4.4. In this system, the class DistOllie implements the interface from listing 4.1. It is possible to play against this implementation by means of the robot kit. The implementation consists of two parts. The two parts are separated by means of a dashed line in figure 4.4. The upper part is the local server and has to run on the computer you play on. This local server is responsible for accepting the information from the robot kit and for dividing game situations and tile sequences to the available computers. The lower part is the remote server. Every participating computer has to run a copy of this program. This system was implemented using remote method invocation from Java. Java RMI is described by Coulouris, Dollimore and Kindberg [23]. Java RMI extends the Java object model to provide support for distributed objects in the Java language. It allows objects to invoke methods on remote objects using the same syntax as for local invocations. In addition, type checking applies equally to remote invocations as to local ones. However, an object making a remote invocation is aware that its target is remote as it must handle
Chapter 4: Design and implementation
52
RemoteExceptions. The implementor of a remote object is also aware that it is remote because it must implement the Remote interface. Some classes from figure 4.3 are removed in figure 4.4. However, both the local and remote server use the classes from figure 4.3 for representation. Furthermore, they also use the same heuristic and move generator. As incoming tiles are taken into account, the remote servers use a slightly adapted search algorithm which is able to pick tiles from the bag and put them on the rack and vice versa. Finally, the classes for recording statistics and results are used here too. When DistOllie is initialised by calling the InitRobot method, the instance asks the DistriMoveSearcher to test the connections. This is done in the method pingAllRemoteServers in a singleton instance of LocalServer. In this method, a list of all available hosts is iterated and the method ping is invoked on every remote server with the hostname of the local server as parameter. Then, all RemoteServer objects start an instance of a PingReplyThread. Such a thread calls the method pong from the local server. The result of this procedure is twofold. Firstly, the local server knows which hosts are really available and prepared to accept game positions and a tile sequence. Secondly, all the remote servers know the hostname of the local server. For any future request, they know to which host they have to send the answer. When DistOllie receives the DoTurnStep call from the robot kit, it makes an instance of Game. Then, this instance is sent to an instance of DistriMoveSearcher. The instance asks the tile bag for a set of different tile sequences. The number of generated sequences maximally equals the number of available computers. These tile sequences are passed to the singleton instance of LocalServer. The LocalServer contacts all available remote servers with the game situation and a different sequence of tiles. When all the available hosts are contacted, a timer is set for a five minute countdown. Every host calculates the scores for every possible move. To have a fair election, the scores assigned to a move must be between zero and one. If the timer goes off, all the hosts have answered or if a move is leading significantly in the election process, the local server calls stopThinking on each remote server and the elected move is played. The LocalServer singleton has three states: pinging, searching and available. When the state is pinging or searching, new jobs are rejected.
Chapter 4: Design and implementation
53
DistOllie -game: commonGame -ourGame: Game -t: Thread -$distriMoveSearcher: DistriMoveSearcher +InitRobot(g:commonGame,evaluator:String, strategy:int): void +StartRobot(continuous:boolean): void +StopRobot(): void +Running(): boolean +RobotPlayerEvent(x:int,y:int,startrect:Rectangle): boolean +DoPlayerQuit(): void +DoGameOver(): void +Quit(): void +DoTurnStep(p:commonPlayer): void
DistriMoveSearcher
1 1
+initialiseRemoteParameters(): void +playBestMove(): void +testConnections() 1
1
LocalServer -_AVAILABLE -_LOCALHOST -_PINGING -_SEARCHING -$boardString: String -$currentPlayer: int -$davesGame: commonGame -$gotRemoteAnswer: boolean[] -$hosts -$machines: Machines -$moveGenerator: MoveGenerator -$nbExcpectedAnswers: int -$playerNames: String[] -$rackString: String -$state: String -$tileString: String
LocalServerInterface +pong(host:String): void +voteForMoves(moves:Move[],host:String): void
+getInstance(): LocalServer +getNbMachines(): int +initialiseRemoteParameters(tileString:String, boardString:String, moveSummary:String, playerNames:String[], rackString:String, currentPlayer:int): void +pingAllRemoteServers(): void +pong(host:String): void +queryForBestMove(game:Game,sequences:int[][], davesGame:commonGame): void +voteForMoves(moves:Move[],host:String): void
GameVoteThread PingReplyThread -$host: String -$lsi: LocalServerInterface +run(): void
-MAXFORCEDDEPTH: int -PRINTTREE: boolean -$game: Game -$heuristic: Heuristic -$host: String -$lsi: LocalServerInterface -$maxDepth: int -$moveGenerator: MoverGenerator -$sequence: int[] -interrupted: boolean +normalise(moves:Move[]): void +run(): void +stopThinking(): void
RemoteServer RemoteServerInterface +getVoteForGame(tileString:String,boardString:String, moveSummary:String,playerNames:String[], rackString:String,currentPlayer:int): void +ping(host:String): void +stopThinking(): void
Figure 4.4
-_rs: RemoteServer -$host: String -$lsi: LocalServerInterface +getHost(): String +getLocalServerInterface(): LocalServerInterface +getVoteForGame(tileString:String,boardString:String, moveSummary:String,playerNames:String[], rackString:String,currentPlayer:int): void +ping(host:String): void +stopThinking(): void
A class diagram of the distributed version
Chapter 5 Results 5.1
Scoring methods
In order to compare various versions of a robot and to know his position relatively to other players and robots, some kind of a scoring system is needed.
5.1.1
Winning percentage
A first and trivial scoring system is to consider the number of games Oliver wins against GoodBot or Robot on a significant number of games. Figure 5.1 illustrates how this winning percentage evolves and becomes stable while playing more and more games. As explained in Mitchell [18], playing multiple games of Tantrix with a robot is a Bernoulli experiment if we only distinguish between winning and not winning (loss or draw). During the experiments, neither GoodBot nor Oliver change by learning and the probability p that Oliver wins a game remains constant. Furthermore, the experiments are independent. The chance that Oliver wins a game is not influenced by previous wins or losses. Let the random variable Xn denote the number of wins in a sample of n games. Then for 0 ≤ w ≤ n, the probability P r(Xn = w) is given by n! pw (1 − p)n−w w!(n − w)!
(5.1)
The expected, or mean value of Xn , E[Xn ], is E[Xn ] = np The variance of Xn , V ar(Xn ) is
54
(5.2)
Chapter 5: Results
55
V ar(Xn ) = np(1 − p)
(5.3)
The standard deviation of Xn , σXn σXn =
q
np(1 − p)
(5.4)
A confidence interval for the proportion By applying the central limit theorem to the random variable Pˆn = ratio of wins in a sample of n games, we find that Pˆn − p q
p(1−p) n
Xn , n
which denotes the
∼ N (0, 1)
(5.5)
as n tends to infinity. N (0, 1) is a normal distribution with zero mean and standard deviation equal to 1. An estimate for a two-sided confidence interval with significance level α of a proportion takes the form s
[Pˆn − zα
s
Pˆn (1 − Pˆn ) ˆ Pˆn (1 − Pˆn ) , Pn + zα ] n n
(5.6)
The appropriate values for zα can be found in table 5.1. Confidence level α: 50% Constant zα : 0.67 Table 5.1
68% 1.00
80% 1.28
90% 95% 98% 99% 1.64 1.96 2.33 2.58
The value zα for two-sided confidence intervals with significance level α
In figure 5.1, 1051 games were played of which 57.8% were won by Oliver. This means that with a confidence of 95% the actual probability to win a game lies in the interval s
0.578 ± 1.96
0.578(1 − 0.578) 1051
= 0.578 ± 0.03 = [0.548, 0.608] Note that this interval is quite large.
Chapter 5: Results
56
If p is 57.8%, the number of games to be played to have a confidence interval with significance level of 95% and a width of 2% is s
1.96
0.244 = 0.01 n
n = 9374 So over 9000 games of Tantrix should be played to have an estimate accurate to ±1%.
5.1.2
Tournament score
In Tantrix tournament games, not only how many games you win or lose counts, but also the difference in points. In the rules for tournaments on the tournament homepage [25], the so-called Tournament Points are described. They are based on sharing out 20 tournament points for each game. Each player starts with 10 tournament points. The winning player √ 4 gets 2 win bonus points plus 3 × winning margin. Table 5.2 is easier to use. If, for example, a game is played with a final score of 23-18, the winner receives 15 tournament points and the loser 5 tournament points. If you have played n games and received tp tournament points, the tournament score ts is given by ts =
tp 20n
(5.7)
If you receive a tournament score ts = 67%, this means that on average you have defeated your opponent with 1 point difference. Achieving a tournament score of 67% is quite difficult. When improving a robot, the tournament score achieved by a competition against another robot does not differ a lot. As shown in figure 5.2, the tournament score varies less than the winning percentage, but it is also less significant. An increase of 1% is already a considerable improvement.
5.1.3
Lobby ranking
Every player at Tantrix.com is ranked. Each time an online game is played, the rankings of the involved players are updated. Oliver, the robot of this project, is also ranked in the lobby ranking system. This ranking system is explained on Tantrix.com [26]. How these rankings are calculated is shown in algorithm 5.1.1. The input of this algorithm is the score and rank of both winner and loser. The algorithm returns the resulting rank of the winner and of the looser. With this algorithm, ranking tables are made and updated on Tantrix.com. They are used to order all the players relatively to each other.
Chapter 5: Results
57
0.7 0.68 0.66
Winning percentage
0.64 0.62 0.6 0.58 0.56 0.54 0.52 0.5
0
Figure 5.1
200
400
600 Nb of played games
800
1000
1200
Winning percentage evolving as games are played
0.7 0.68 0.66
Tournament score
0.64 0.62 0.6 0.58 0.56 0.54 0.52 0.5
0
Figure 5.2
200
400
600 Nb of played games
800
1000
Tournament score evolving as games are played
1200
Chapter 5: Results
58
However, this ranking system is also designed for new users in particular. When a new user makes an account on Tantrix.com to play online, he starts with a lobby ranking of 500. When they have just started to play Tantrix , they mostly lose more games than they win. The lobby ranking is made to encourage these players. A lobby ranking of 750 is not so difficult to reach. Once one has passed this threshold of 750 points it is more difficult to earn points. Furthermore, it is not possible to exceed 1000 points. All best players have a score between 900 and 1000 points. A common strategy for users on Tantrix.com is to wait until a robot with a varying ranking has a high ranking. When you beat such a robot, you receive many more points. For these reasons, this ranking is not appropriate for the best Tantrix players. For them, another ranking called master ranking is used. The lobby ranking is not a good ranking for robots either. It gives an idea, but it is not a strong indicator. Algorithm 5.1.1: NewRankings(W Score, W Rank, LScore, LRank) if not W Score = LScore local change, worth, handicapW inner, handicapLoser local W Change, W RankN ew, LChange, LRankN ew √ LScore change ← W Score − q LRank worth ← 8 × change × W Rank then
q 1000−W Rank handicapOf W inner ← 1000 q handicapOf Loser ← 1000−LRank 1000
W Change ← bworth × handicapOf W innerc W RankN ew ← W Rank + W Change if LRank > 750 then LChange ← b worth × (1 − handicapOf Loser)c 2 else LChange ← bworth × (1 − handicapOf Loser)c LRankN ew ← LRank − LChange
return (W RankN ew, LRankN ew) else return (W Rank, LRank)
5.1.4
Master ranking and ELO rating
As the lobby ranking is not a good ranking for good players, a ranking was introduced especially for master players. Along with their ELO rating, the master ranking is a good indication of the player’s strength. A new master starts with 1200 points. The difference in the master ranking of the players is the expected difference in points multiplied by 1000. If two masters differ in 1000 points, the expected difference in score is 1 point. There is
Chapter 5: Results
59
also an ELO rating for Tantrix . However, no robot has been ranked in the master rankings or ELO rating so far. Master level As described on Tantrix.com [27], the currently best 120 Tantrix players are called masters. To become a master, some criteria have to be fulfilled. A lobby ranking of 950 points must be reached and an ELO rating of 1775 must be gained by playing tournaments. To maintain the master status, a master has to play 5 master games every three months. If the lobby ranking drops below 950, this does not affect the master status.
Chapter 5: Results
60
Margin of victory TPs winner TPs loser TPs difference 0 10.0 10.0 0.0 1 13.3 6.7 6.6 2 13.9 6.1 7.8 3 14.3 5.7 8.6 4 14.7 5.3 9.4 5 15.0 5.0 10.0 6 15.3 4.7 10.6 7 15.5 4.5 11.0 8 15.8 4.2 11.6 9 16.0 4.0 12.0 10 16.2 3.8 12.4 11 16.4 3.6 12.8 12 16.6 3.4 13.2 13 16.8 3.2 13.6 14 17.0 3.0 14.0 15 17.2 2.8 14.4 16 17.3 2.7 14.6 17 17.5 2.5 15.0 18 17.7 2.3 15.4 19 17.8 2.2 15.6 20 18.0 2.0 16.0 21 18.1 1.9 16.2 22 18.3 1.7 16.6 23 18.4 1.6 16.8 24 18.5 1.5 17.0 25 18.7 1.3 17.4 26 18.8 1.2 17.6 27 18.9 1.1 17.8 28 19.1 0.9 18.2 29 19.2 0.8 18.4 30 19.3 0.7 18.6 31 19.4 0.6 18.8 32 19.5 0.5 19.0 33 19.7 0.3 19.4 34 19.8 0.2 19.6 35 19.9 0.1 19.8 36+ 20.0 0.0 20.0 Table 5.2
Tantrix Tournament Points
Chapter 5: Results
5.2
61
Score of the implementations
5.2.1
GoodBot vs Robot
As a reaction to this project, GoodBot was developed by Dave Dyer, who programmed Robot before. GoodBot is the result of removing the limitations on the search of Robot. As a result, it plays better, but also more slowly. The results of 554 games between Robot and GoodBot are shown in table 5.3. The test was run by Dave Dyer. Equation 5.6 with a confidence level of 95% is applied to all the winning percentages mentioned from this point onwards. Tournament points Robot 48% GoodBot 52% Table 5.3
5.2.2
Winning percentage (43 ± 4)% (49 ± 4)%
Results of Robot in competition with GoodBot
Implementation of the virtual line heuristic
In a competition of 128 games against Robot, an implementation which only takes virtual line length into account achieved the results in table 5.4. This implementation smoothed to real line length in the end of the game too, similar to the method described in section 3.3.5. 5% of the games were draw. The results of the same implementation in competion with GoodBot is given in table 5.5. This shows that this implementation was already significantly better than GoodBot.
Oliver ver. I Robot Table 5.4
Winning percentage (57 ± 8.6)% (38 ± 8.4)%
Results of an implementation with virtual line length in competition with Robot (128 games)
Oliver ver. I GoodBot Table 5.5
Tournament points 56% 44%
Tournament points 53% 47%
Winning percentage (52 ± 5)% (42 ± 5)%
Results of an implementation with virtual line length in competition with GoodBot (391 games)
Chapter 5: Results
5.2.3
62
Implementation of the fair share line length heuristic
As shown in table 5.6 and 5.7, better results were achieved after implementing the fair share line heuristic. The virtual line length that was used here is not exactly the same as the one described in section 3.3.2. It was only later that points were subtracted for gaps. Tournament points Oliver ver. II 61% Robot 39% Table 5.6
Results after adding fair share line length in competition with Robot (287 games)
Tournament points Oliver ver. II 56% GoodBot 44% Table 5.7
5.2.4
Winning percentage (66 ± 5)% (27 ± 5)%
Winning percentage (57 ± 3)% (38 ± 3)%
Results adding fair share line length in competition with GoodBot (1051 games)
Distributed implementation
Table 5.8 shows the results of a distributed implementation. This implementation uses the ‘Monte Carlo’ approach as described in section 4.4.4. The remote clients are only used for the last 10 moves before the tile bag is empty. For the first 34 moves, the normal implementation as described in the previous section was used. In earlier tests, random sequences were divided from the start of the game on, but this was performing worse. Therefore, the decision was taken to do this only in the last moves. When comparing table 5.7 and table 5.8, the conclusion can be made that this did not yield better results. The reason for these results and an improvement to the algorithm is proposed in section 4.4.2. Tournament points Oliver 56% GoodBot 44% Table 5.8
5.2.5
Winning percentage (57 ± 7)% (36 ± 6)%
Results of a distributed implementation in competition with GoodBot (220 games)
Final implementation
When writing the text and packaging the version that plays on Tantrix.com, some remaining bugs have been removed. This also led to a significant improvement of the robot.
Chapter 5: Results
63
Tournament points Oliver 60% Robot 40% Table 5.9
Results of the final version in competition with Robot (430 games)
Tournament points Oliver 58% GoodBot 42% Table 5.10
Winning percentage (66 ± 4)% (29 ± 4)%
Winning percentage (64 ± 4.6)% (31 ± 4.4)%
Results of the final version in competition with GoodBot (426 games)
The battle of the bots On the 11th of May 2005, an official online battle was held between GoodBot and Oliver. The battle was open to be spectated. As expected, Oliver defeated GoodBot. The results of this battle are given in table 5.11. Tournament points Oliver 59% GoodBot 41% Table 5.11
Winning percentage (65 ± 7)% (31 ± 6)%
Results of the official Tantrix battle Oliver vs. GoodBot
Lobby ranking and overall winning percentage The rankings of all robots can be seen on Tantrix.com [28]. So far, the lobby ranking has been variating between 800 and 950 points. This gives an indication, but is not relevant. The score variates too much and can not be used to rank it relatively to other players. The most obvious method to rank Oliver relatively to other players is to let him participate in tournaments. On the same page, the overall winning percentage is also mentioned. As both GoodBot and Oliver are playing against players of a same public, this is a good indicator on a long term. Since Oliver was introduced on Tantrix.com this score has been decreasing. A reason for this could be that people have become more familiar with the strategy of Oliver now. They might also be aware of some weak points of the robot and exploit these. Since Oliver is stronger than GoodBot, it is also possible that weaker players prefer weaker robots as their opponents.
Chapter 5: Results
5.3 5.3.1
64
Statistics of the game Branching
Figure 5.3 shows the minimum, maximum and average branching. The branching is the total number of legal moves for a certain move number. This depends on the move number. For the first move, there are only six possible ways to play a move. When the bag is empty, there are suddenly many more moves. The figure shows that for midgame (move 10 to 44) the average branching is approximately 46. During the first 10 moves, branching is a little higher. Branching decreases from move 10 on as there are more blocked sides on the game board. The figure also shows that branching can be higher and lower. In fact, in Tantrix it is possible that branching is 0. This happens only in one out of 10, 000 games. In this figure, branching was only taken into account if the move was not forced. For forced moves, branching is typically less than 5.
5.3.2
Evaluated nodes
The total number of game situations that are evaluated by the heuristic for each move is shown in figure 5.4. Only the number of evaluations while searching for a free move are considered. The graph only includes data for moves searched with alpha-beta algorithm for a standard depth of three and a maximum depth of eight. There is no data for the first move. As the implementation does not use an opening book, the program searches at a depth of four for the first move. In this way, Oliver takes more care that he does not end up in a short loop in startgame and tries to force his opponent in a short loop if possible. The average number of evaluated game positions is also related to the branching. After the first 5 moves, the average decreases as sides of the game board become blocked. During midgame, on average 5, 000 game positions are evaluated to decide which move is the best one. A few moves before the tile bag is empty, the average starts to increase heavily. For move 45, the first move after the bag is empty, on average 43, 000 game positions are evaluated. However, this does not mean that the robot takes four times as long to think. Both virtual line length heuristic and fair share heuristic take into account the tiles which are left the bag and on the racks of the players into account. As the bag is empty or contains only a couple of tiles around the peak, there is clearly less effort needed to evaluate a certain board position. Figure 5.3 also shows that branching can be higher in some situations. The number of evaluated board positions can easily be ten times higher.
5.4
Analysis of the style of playing
The analysis of Tantrix games is a specialisation on its own. For this reason, Oliver played two games against Rob Morton, who finished second in the Tantrix world championship of
Chapter 5: Results
65
200
minimum branching maximum branching average branching
180 160 140 120 100 80 60 40 20 0
5
Figure 5.3
45000
10
15
20
25
30
35
40
45
50
55
Minimum, maximum and average branching based on 1200 games
average
40000 35000 30000 25000 20000 15000 10000 5000 0
Figure 5.4
5
10
15
20
25
30
35
40
45
50
55
Average number of evaluated game positions for a free move, based on more than 5000 games
Chapter 5: Results
66
2004. The games were saved as U!cheekymbk-23-Oliver-18-2005-05-09-1752 and U!Oliver15-cheekymbk-25-2005-05-09-1826. Both games were won by Rob Morton. During both games, Rob analysed that Oliver made some strange choices and after the game he also specified the key moves that made Oliver lose the game. A complete analysis can be found in appendix C. Here we limit the discussion to the key moves.
5.4.1
The first game
This game was saved as U!cheekymbk-23-Oliver-18-2005-05-09-1752. The key move in this game was move 16. The result of this move is shown in figure 5.5. Here, Oliver tries to waste yellow tiles and blocks the end point of the opponent’s line. By doing this, however, he takes a risk. The best move for Oliver would have been to divert the yellow line in the other direction as shown in figure 5.6. This move would divert the yellow line in the other direction, making it much more difficult for yellow to connect. After move 16, a forced space was created. A tile from Oliver’s rack fitted in this forced space. At move 18 the resulting game situation was as shown in figure 5.7. There are 2 tiles left in the tile bag that fit in the forced space. As there are 27 tiles left in the tile bag, the chance to draw 2 . Rob was lucky to draw a fitting tile in one of the next moves. The resulting a tile is 27 game situation turned out as shown in figure 5.8. The game situation is worse for Oliver, as Rob was able to connect indirectly all yellow tiles on the board. The reason Oliver played this move is that he is only aware of the fair share line length. The ‘iterative filling of interesting places’ heuristic, described in section 3.3.4, is not implemented in the version that is currently playing online.
Chapter 5: Results
67
Figure 5.5
Figure 5.6
A key move by Oliver in this game
This would have been a better move
Chapter 5: Results
68
Figure 5.7
Figure 5.8
Intermediate result at move 18
The result of the mistake in move 16 in move 19
Chapter 5: Results
5.4.2
69
A second game
In this game, saved as, U!Oliver-15-cheekymbk-25-2005-05-09-1826, a strange move by Oliver in the very beginning had great consequences further on in the game. The key move in this game was move 3. The game situation at move 3 is shown in figure 5.9. Here Oliver could not connect the red corner to the straigth red line. As he only takes the longest line into account, he gives up the red corner, decides to waste yellow tiles and makes an indirect line of three tiles. The result of this move is shown in figure 5.10. At move 5, yellow cannot connect his two lines, making Oliver think he is on the winning hand. Later on in the game, however, Oliver’s mistake is exploited. At move 22 yellow manages to connect the two lines, resulting in a bad game position for Oliver. On figure 5.12, the initial game position can be recognized around the red loop of 3 tiles. As Oliver was not able to connect the two red pieces, he dropped the first piece and tried to extend his line while wasting yellow. This was succesful for him, as yellow ends up with two separate lines that cannot be connected in the next move. A well-tuned ‘iterative filling of interesting places’ heuristic would not have seen this either, as after the filling of a created forced space, the two lines are still not connected. However, in startgame, Oliver should not have given up the second longest line too soon. In startgame, the lines still lie close to each other and the probability is high that they will be connected eventually.
Chapter 5: Results
70
Figure 5.9
At move 3 Oliver makes a mistake
Figure 5.10
The result of move 3
Chapter 5: Results
71
Figure 5.11
Figure 5.12
This would have been a better move
At move 22 the mistake of move 3 is exploited
Chapter 6 Future research As the current implementation does not defeat all human Tantrix players in the world, there is still room for improvement. The robot can be improved in two ways. Firstly, the program can be changed to play faster. Secondly, the existing algorithms can be tuned and new algorithms can be developed to make better decisions.
6.1
Faster play
In this section, some possible improvements are suggested for faster play. • For the first move, the current implementation searches at a depth of four, instead of a depth of three, which is used later on in the game. This could easily be done, as there are only six possibilities for the first move. If Oliver is on turn in the first move, he takes rather a long time to think in comparison to human players. Strategies for playing the first moves are well-known, as explained in section 2.2.1. For the first two moves, this knowledge could be hard coded. By doing that, the robot could decide immediately about his first move. • In endgame situations, game situations can arise in which the final score is already known. No matter what the players do, the score cannot be influenced any more. If such situations are detected in the future, a random move can be played instantaneously. • During the implementation of the program, a higher priority was given to the results, rather than to the speed of every part. As it was not known what each next step would be towards a better robot, simple implementations were, at times, preferred above faster but more complex ones. However, this does not mean that the implementation is slow now. In midgame, for instance, speed is comparable to GoodBot.
72
Chapter 6: Future research
73
• Besides alpha-beta, a variety of other search algorithms exists. Algorithms like NegaScout, SSS* and MTD(f ) find the same move as alpha-beta, but most often in less time and with fewer evaluations of game positions than alpha-beta. • It is possible that the alpha-beta search algorithm encounters multiple leaves representing the same game situation. Suppose that, firstly, the player on turn can choose between two free moves A and B. Secondly, suppose that none of these two moves leads to a forced space for any of the players. Furthermore, assume that the opponent can react with move C, which is neither leading to a forced space for him nor for the other player. The game situation that arises at the end of the path A, C, B is the same as the one at the end of the path B, C, A. Transposition tables could be used to avoid multiple evaluations of identical game situations. • The part of the board evaluation that is the same for siblings nodes can be saved rather than redone for each move.
6.2
Better play
These are some suggestions that could lead to a higher ranked robot. • An interface with rich functionality for Tantrix robot development is needed to get better results. This interface should allow the programmer to visualise the algorithms, to suggest alternative moves and see their assigned score and to replay any game from any position onwards, including older games from the database. • In the heuristic function there are a lot of parameters that were chosen rather ad hoc. Improvement would be considerable if these parameters could be learned. It should be analysed carefully which methods are appropriate for machine learning. • A flawless and well-tuned implementation of the ’iterative filling of interesting places’ heuristic might be finished in the future. This implementation can only work fast enough in combination with the techniques mentioned in the previous section. • A correction of the ’Monte Carlo’ approach towards the alternative approach as suggested in section 4.4.2. • The order of filling forced spaces is arbitrary if choises are independant. The best one should be filled first.
Chapter 7 General conclusion In this thesis, a design and an implementation was made of a program for playing Tantrix . Tantrix is a strategic board game which consists of 56 hexagonal tiles which contain three coloured lines. The game, described in chapter 1, was invented by Mike McManaway and is available in a game pack. It can also be played online on Tantrix.com. Several robots have been created so far to play this game, but they can easily be beaten by advanced players. In my thesis, I have made a new robot, called Oliver. Following the human strategies explained in 2, some algorithms were presented in chapter 3 to decide about the best move for a given game situation in Tantrix . These algorithms are all based on a classical approach using minimax-like algorithms and heuristic evaluations of game positions. Both minimax and alphabeta were implemented in this thesis. They search at a depth of three. This depth is reasonable as the game situation changes too much after three moves. However, when forced moves are involved, the game situation needs to be investigated at a higher depth too. In this way, the effect of forced moves is evaluated on a longer term. Next, some heuristics were introduced. This results in an ability to map Tantrix game positions into the real numbers for estimating the winning chance for the player on his turn. Making a good evaluation function for Tantrix is a hard problem. The most simple heuristic is the difference between the highest scoring lines or loops of both players, called real line length. This heuristic was too simple, as it does not take into account lines that contain temporarily gaps. Therefore, the virtual line length was introduced. Additionaly, a method was introduced to estimate the final line length by assuming that a certain number of tiles left in the tile bag and on the racks of the players will be added to the longest line in the course of the game. This certain amount of tiles is a fair share and therefore, this heuristic is called the fair share line length. Finally, a last heuristic was introduced which takes the possible results of filling forced spaces into account. This heuristic is called the ‘iterative filling of interesting places’, as it iteratively fills the forced spaces. In a first version, these ideas were implemented, as described in chapter 4, assuming that no tiles from the tile bag enter the game. The tiles from the tile bag are only considered in 74
Chapter 7: General conclusion
75
the evaluation of probabilities. The final implementation has been released on Tantrix.com and is called Oliver. In this version, alpha-beta is implemented together with the first three heuristics. This version needs on average two minutes of total thinking time on a Pentium IV 2.5 GHz, while the limit for a game is fifteen minutes. Furthermore, this implementation is currently the best performing program for playing Tantrix , as results make clear in chapter 5. In a public challenge, the former best robot programmed by Dave Dyer, called GoodBot, was defeated. In a series of 200 games between GoodBot and Oliver, Oliver managed to win 129 games. He lost 61 games and 10 games resulted in a draw. A second version tried to take the incoming tiles from the bag into account. As the possible number of different tile sequences is quite huge, except for the last few moves before the tile bag is empty, this implementation randomly distributes a number of possible tile sequences to different computers, which vote for the best move. They do this only in the last ten moves before the tile bag is empty. However, this implementation did not perform significantly better than the first one. On the other hand, the results were neither significantly poorer. Some additional suggestions for futher improvement were given in chapter 6. I hope that, with this thesis, a next step towards a program that is challenging the best Tantrix masters has been made.
Appendix A Source code of the search algorithms A.1 A.1.1
Minimax algorithm Initialising minimax
private Move useMiniMax ( ) throws S e a r c h T e r m i n a t e d E x c e p t i o n { double b e s t = Double . NEGATIVE INFINITY ; double r e s u l t S c o r e = 0 ; Move r e s u l t = n u l l ; boolean f o r c e d ; S t a t i s t i c s . getInstance ( ) . reset ( ) ; i f ( getGame ( ) . g e t C u r r e n t P l a y e r ( ) != getGame ( ) . g e t R o b o t P l a y e r ( ) ) { System . out . p r i n t l n ( ” Warning : we a r e not p l a y i n g f o r r o b o t ? ” ) ; } Move [ ] f o r c e d M o v e s = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( f o r c e d M o v e s . l e n g t h == 0 && getGame ( ) . hasPlayedFreeMove ( ) ) { System . out . p r i n t l n ( ” Warning : n o t h i n g t o do f o r me . . . no f o r c e d move ” + ” and I have a l r e a d y p l a y e d a f r e e move . ” ) ; } i f ( f o r c e d M o v e s . l e n g t h == 1 ) { System . out . p r i n t l n ( ” I ’m f o r c e d and t h e r e i s o n l y one p o s s i b i l i t y ” ) ; S t a t i s t i c s . getInstance ( ) . informPossibilities ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fPl a ye d Ti l es ( ) + 1 , 1 , true ) ; return f o r c e d M o v e s [ 0 ] ; } i f ( forcedMoves . length > 0) { f o r c e d = true ; S t a t i s t i c s . getInstance ( ) . informPossibilities ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fPl a ye d Ti l es ( ) + 1 , f o r c e d M o v e s . l e n g t h , true ) ; System . out . p r i n t l n ( ” Looking f o r FORCED moves ” ) ; f o r ( i n t i = 0 ; i < f o r c e d M o v e s . l e n g t h ; i ++) { printMove ( f o r c e d M o v e s [ i ] , 0 ) ; playMove ( f o r c e d M o v e s [ i ] ) ; r e s u l t S c o r e = minimax ( 1 ) ; undoMove ( f o r c e d M o v e s [ i ] ) ; forcedMoves [ i ] . s e t S c o r e ( r e s u l t S c o r e ) ; printMove ( f o r c e d M o v e s [ i ] , 0 ) ; i f ( resultScore > best ) { best = resultScore ; r e s u l t = forcedMoves [ i ] ;
76
Chapter A: Source code of the search algorithms
result . setScore ( resultScore ); System . out . p r i n t l n ( ” Best move s o f a r : ” ) ; System . out . p r i n t l n ( r e s u l t . t o S t r i n g ( ) ) ; } } } else { Move [ ] f r e e M o v e s = getMoveGenerator ( ) . getFreeMoves ( getGame ( ) ) ; forced = false ; S t a t i s t i c s . getInstance ( ) . informPossibilities ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) + 1 , freeMoves . length , false ) ; System . out . p r i n t l n ( ” Looking f o r FREE moves ” ) ; getGame ( ) . setPlayedFreeMove ( true ) ; f o r ( i n t i = 0 ; i < f r e e M o v e s . l e n g t h ; i ++) { printMove ( f r e e M o v e s [ i ] , 0 ) ; playMove ( f r e e M o v e s [ i ] ) ; r e s u l t S c o r e = minimax ( 1 ) ; undoMove ( f r e e M o v e s [ i ] ) ; freeMoves [ i ] . s e t S c o r e ( r e s u l t S c o r e ) ; printMove ( f r e e M o v e s [ i ] , 0 ) ; System . out . p r i n t l n ( ” I s ” + r e s u l t S c o r e + ” b e t t e r than ” + b e s t + ” ? ” ) ; i f ( resultScore > best ) { best = resultScore ; r e s u l t = freeMoves [ i ] ; result . setScore ( resultScore ); System . out . p r i n t l n ( ” Best move s o f a r : ” ) ; System . out . p r i n t l n ( r e s u l t . t o S t r i n g ( ) ) ; } } } S t a t i s t i c s . g e t I n s t a n c e ( ) . informEndOfSearch ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) + 1 , f o r c e d , getMaxDepth ( ) , MAXFORCEDDEPTH, ” minimax ” ) ; return r e s u l t ; }
A.1.2
The minimax algorithm
private double minimax ( i n t depth ) throws S e a r c h T e r m i n a t e d E x c e p t i o n { if ( interrupted ) { throw new S e a r c h T e r m i n a t e d E x c e p t i o n ( ”You don ’ t want t o know my r e s u l t anymore ! ” ) ; } S t a t i s t i c s . getInstance ( ) . informEvalution ( ) ; double r e s u l t ; i f ( depth == getMaxDepth ( ) | | depth == MAXFORCEDDEPTH | | getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) == 5 6 ) { S t a t i s t i c s . getInstance ( ) . informEvalution ( ) ; double s c o r e = g e t H e u r i s t i c ( ) . g e t S c o r e ( getGame ( ) ) ; return s c o r e ; } i f ( ! getGame ( ) . g e t O t h e r P l a y e r ( ) . h a s T i l e s L e f t ( ) ) { S t a t i s t i c s . getInstance ( ) . informEvalution ( ) ; return g e t H e u r i s t i c ( ) . g e t S c o r e ( getGame ( ) ) ; } i f ( ! getGame ( ) . g e t C u r r e n t P l a y e r ( ) . h a s T i l e s L e f t ( ) ) { getGame ( ) . n e x t P l a y e r ( ) ; r e s u l t = minimax ( depth ) ; getGame ( ) . p r e v i o u s P l a y e r ( ) ; return r e s u l t ; } else { i f ( getGame ( ) . g e t C u r r e n t P l a y e r ( ) == getGame ( ) . g e t O t h e r P l a y e r ( ) ) { // This i s a MIN node ; r e s u l t = Double . POSITIVE INFINITY ;
77
Chapter A: Source code of the search algorithms
( ! getGame ( ) . hasPlayedFreeMove ( ) ) { // Not y e t p l a y e d a f r e e // move Move [ ] c h i l d r e n = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( c h i l d r e n . l e n g t h != 0 ) { // There a r e f o r c e d s p a c e s l e f t f o r ( i n t i = 0 ; i < c h i l d r e n . l e n g t h ; i ++) { printMove ( c h i l d r e n [ i ] , depth ) ; playMove ( c h i l d r e n [ i ] ) ; // Put t i l e from r a c k on // gameboard increaseDepth ( ) ; c h i l d r e n [ i ] . s e t S c o r e ( minimax ( depth + 1 ) ) ; decreaseDepth ( ) ; undoMove ( c h i l d r e n [ i ] ) ; //Remove t i l e from gameboard // t o r a c k printMove ( c h i l d r e n [ i ] , depth ) ; i f ( children [ i ] . getScore () < re s ul t ) { r es ul t = children [ i ] . getScore ( ) ; } } return r e s u l t ; } e l s e { // There a r e no f o r c e d s p a c e s l e f t , p l a y t h e f r e e // move c h i l d r e n = getMoveGenerator ( ) . getFreeMoves ( getGame ( ) ) ; i f ( c h i l d r e n . l e n g t h != 0 ) { // There a r e f o r c e d s p a c e s // l e f t getGame ( ) . setPlayedFreeMove ( true ) ; f o r ( i n t i = 0 ; i < c h i l d r e n . l e n g t h ; i ++) { printMove ( c h i l d r e n [ i ] , depth ) ; playMove ( c h i l d r e n [ i ] ) ; // Put t i l e from r a c k on // gameboard c h i l d r e n [ i ] . s e t S c o r e ( minimax ( depth + 1 ) ) ; undoMove ( c h i l d r e n [ i ] ) ; //Remove t i l e from // gameboard t o r a c k printMove ( c h i l d r e n [ i ] , depth ) ; i f ( children [ i ] . getScore () < re s ul t ) { r es u lt = children [ i ] . getScore ( ) ; } } getGame ( ) . setPlayedFreeMove ( f a l s e ) ; return r e s u l t ; } } } e l s e { //The f r e e move was y e t p l a y e d Move [ ] c h i l d r e n = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( c h i l d r e n . l e n g t h != 0 ) { // There a r e f o r c e d s p a c e s l e f t f o r ( i n t i = 0 ; i < c h i l d r e n . l e n g t h ; i ++) { printMove ( c h i l d r e n [ i ] , depth ) ; playMove ( c h i l d r e n [ i ] ) ; // Put t i l e from r a c k on // gameboard increaseDepth ( ) ; c h i l d r e n [ i ] . s e t S c o r e ( minimax ( depth + 1 ) ) ; decreaseDepth ( ) ; undoMove ( c h i l d r e n [ i ] ) ; //Remove t i l e from gameboard // t o r a c k printMove ( c h i l d r e n [ i ] , depth ) ; i f ( children [ i ] . getScore () < re s ul t ) { r es ul t = children [ i ] . getScore ( ) ; } } return r e s u l t ; } e l s e { // There a r e no f o r c e d s p a c e s l e f t getGame ( ) . n e x t P l a y e r ( ) ; // I t ’ s now t h e t u r n o f t h e // o t h e r p l a y e r r e s u l t = minimax ( depth ) ; // We s t a y a t t h e same d e p t h getGame ( ) . p r e v i o u s P l a y e r ( ) ; // Back t o us return r e s u l t ; if
78
Chapter A: Source code of the search algorithms
} } } i f ( getGame ( ) . g e t C u r r e n t P l a y e r ( ) == getGame ( ) . g e t R o b o t P l a y e r ( ) ) { // Make a move f o r r o b o t // This i s a MAX node r e s u l t = Double . NEGATIVE INFINITY ; i f ( ! getGame ( ) . hasPlayedFreeMove ( ) ) { // Not y e t p l a y e d a f r e e // move Move [ ] c h i l d r e n = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( c h i l d r e n . l e n g t h != 0 ) { // There a r e f o r c e d s p a c e s l e f t f o r ( i n t i = 0 ; i < c h i l d r e n . l e n g t h ; i ++) { printMove ( c h i l d r e n [ i ] , depth ) ; playMove ( c h i l d r e n [ i ] ) ; // Put t i l e from r a c k on // gameboard increaseDepth ( ) ; c h i l d r e n [ i ] . s e t S c o r e ( minimax ( depth + 1 ) ) ; decreaseDepth ( ) ; undoMove ( c h i l d r e n [ i ] ) ; //Remove t i l e from gameboard // t o r a c k printMove ( c h i l d r e n [ i ] , depth ) ; i f ( children [ i ] . getScore () > re s ul t ) { r es ul t = children [ i ] . getScore ( ) ; } } return r e s u l t ; } e l s e { // There a r e no f o r c e d s p a c e s l e f t , p l a y t h e f r e e // move c h i l d r e n = getMoveGenerator ( ) . getFreeMoves ( getGame ( ) ) ; getGame ( ) . setPlayedFreeMove ( true ) ; f o r ( i n t i = c h i l d r e n . l e n g t h − 1 ; i >= 0 ; i −−) { printMove ( c h i l d r e n [ i ] , depth ) ; playMove ( c h i l d r e n [ i ] ) ; // Put t h e t i l e from t h e r a c k // on t h e gameboard c h i l d r e n [ i ] . s e t S c o r e ( minimax ( depth + 1 ) ) ; undoMove ( c h i l d r e n [ i ] ) ; //Remove t i l e from gameboard // t o r a c k printMove ( c h i l d r e n [ i ] , depth ) ; i f ( children [ i ] . getScore () > re s ul t ) { r es ul t = children [ i ] . getScore ( ) ; } } getGame ( ) . setPlayedFreeMove ( f a l s e ) ; return r e s u l t ; } } e l s e { //The f r e e move was y e t p l a y e d Move [ ] c h i l d r e n = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( c h i l d r e n . l e n g t h != 0 ) { // There a r e f o r c e d s p a c e s l e f t f o r ( i n t i = c h i l d r e n . l e n g t h − 1 ; i >= 0 ; i −−) { printMove ( c h i l d r e n [ i ] , depth ) ; playMove ( c h i l d r e n [ i ] ) ; // Put t i l e from r a c k on // gameboard increaseDepth ( ) ; c h i l d r e n [ i ] . s e t S c o r e ( minimax ( depth + 1 ) ) ; decreaseDepth ( ) ; undoMove ( c h i l d r e n [ i ] ) ; //Remove t i l e from gameboard // t o r a c k printMove ( c h i l d r e n [ i ] , depth ) ; i f ( children [ i ] . getScore () > re s ul t ) { r es ul t = children [ i ] . getScore ( ) ; } } return r e s u l t ; } e l s e { // There a r e no f o r c e d s p a c e s l e f t getGame ( ) . n e x t P l a y e r ( ) ; // I t ’ s now t h e t u r n o f t h e
79
Chapter A: Source code of the search algorithms
// o t h e r p l a y e r r e s u l t = minimax ( depth ) ; // We s t a y a t t h e same d e p t h getGame ( ) . p r e v i o u s P l a y e r ( ) ; // Back t o us return r e s u l t ; } } } } System . out . p r i n t l n ( ” Warning : This s h o u l d n ’ t happen ” ) ; return Double . NaN ; }
A.2 A.2.1
Alphabeta algorithm Initialising alphabeta
private Move useAlphaBeta ( ) throws S e a r c h T e r m i n a t e d E x c e p t i o n { S t a t i s t i c s . getInstance ( ) . reset ( ) ; boolean f o r c e d ; double a l p h a = Double . NEGATIVE INFINITY ; double b e t a = Double . POSITIVE INFINITY ; Move r e s u l t = new Move ( a l p h a ) ; i f ( getGame ( ) . g e t C u r r e n t P l a y e r ( ) != getGame ( ) . g e t R o b o t P l a y e r ( ) ) { System . out . p r i n t l n ( ” Warning : we a r e not p l a y i n g f o r r o b o t ? ” ) ; } Move [ ] f o r c e d M o v e s = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( f o r c e d M o v e s . l e n g t h == 0 && getGame ( ) . hasPlayedFreeMove ( ) ) { System . out . p r i n t l n ( ” Warning : n o t h i n g t o do f o r me . . . no f o r c e d move ” + ” and I have a l r e a d y p l a y e d a f r e e move . ” ) ; } double r e s u l t S c o r e = a l p h a ; i f ( f o r c e d M o v e s . l e n g t h == 1 ) { System . out . p r i n t l n ( ”Not much c h o i c e f o r me : ) ” ) ; S t a t i s t i c s . getInstance ( ) . informPossibilities ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) + 1 , 1 , true ) ; return f o r c e d M o v e s [ 0 ] ; } i f ( forcedMoves . length > 0) { f o r c e d = true ; S t a t i s t i c s . getInstance ( ) . informPossibilities ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) + 1 , f o r c e d M o v e s . l e n g t h , true ) ; forcedMoves = preSort ( forcedMoves ) ; f o r ( i n t i = 0 ; i < f o r c e d M o v e s . l e n g t h ; i ++) { printMove ( f o r c e d M o v e s [ i ] , 0 ) ; playMove ( f o r c e d M o v e s [ i ] ) ; r e s u l t S c o r e = a l p h a b e t a ( 1 , alpha , beta , getGame ( ) . hasPlayedFreeMove ( ) ) ; undoMove ( f o r c e d M o v e s [ i ] ) ; forcedMoves [ i ] . s e t S c o r e ( r e s u l t S c o r e ) ; printMove ( f o r c e d M o v e s [ i ] , 0 ) ; i f ( r e s u l t S c o r e > alpha ) { alpha = r e s u l t S c o r e ; r e s u l t = forcedMoves [ i ] ; } } } else { forced = false ; Move [ ] f r e e M o v e s = getMoveGenerator ( ) . getFreeMoves ( getGame ( ) ) ; S t a t i s t i c s . getInstance ( ) . informPossibilities (
80
Chapter A: Source code of the search algorithms
getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) + 1 , freeMoves . length , false ) ; getGame ( ) . setPlayedFreeMove ( true ) ; freeMoves = preSort ( freeMoves ) ; f o r ( i n t i = 0 ; i < f r e e M o v e s . l e n g t h ; i ++) { printMove ( f r e e M o v e s [ i ] , 0 ) ; playMove ( f r e e M o v e s [ i ] ) ; r e s u l t S c o r e = a l p h a b e t a ( 1 , alpha , beta , getGame ( ) . hasPlayedFreeMove ( ) ) ; // r e s u l t S c o r e = minimax ( 1 ) ; undoMove ( f r e e M o v e s [ i ] ) ; freeMoves [ i ] . s e t S c o r e ( r e s u l t S c o r e ) ; printMove ( f r e e M o v e s [ i ] , 0 ) ; i f ( r e s u l t S c o r e > alpha ) { alpha = r e s u l t S c o r e ; r e s u l t = freeMoves [ i ] ; } } } S t a t i s t i c s . g e t I n s t a n c e ( ) . informEndOfSearch ( getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) + 1 , f o r c e d , getMaxDepth ( ) , MAXFORCEDDEPTH, ” a l p h a b e t a ” ) ; result . setScore ( resultScore ); return r e s u l t ; }
A.2.2
The alphabeta algorithm
private double a l p h a b e t a ( i n t depth , double a , double b , boolean playedFreeMove ) throws S e a r c h T e r m i n a t e d E x c e p t i o n { if ( interrupted ) { throw new S e a r c h T e r m i n a t e d E x c e p t i o n ( ”You don ’ t want t o know my r e s u l t anymore ! ” ) ; } i f ( depth == getMaxDepth ( ) | | depth == MAXFORCEDDEPTH | | getGame ( ) . getGameBoard ( ) . g e tN um b e rO fP la ye d T il e s ( ) == 5 6 ) { S t a t i s t i c s . getInstance ( ) . informEvalution ( ) ; return g e t H e u r i s t i c ( ) . g e t S c o r e ( getGame ( ) ) ; } i f ( ! getGame ( ) . g e t O t h e r P l a y e r ( ) . h a s T i l e s L e f t ( ) ) { S t a t i s t i c s . getInstance ( ) . informEvalution ( ) ; return g e t H e u r i s t i c ( ) . g e t S c o r e ( getGame ( ) ) ; } double r e s u l t ; i f ( ! getGame ( ) . g e t C u r r e n t P l a y e r ( ) . h a s T i l e s L e f t ( ) ) { getGame ( ) . n e x t P l a y e r ( ) ; r e s u l t = a l p h a b e t a ( depth , a , b , f a l s e ) ; getGame ( ) . p r e v i o u s P l a y e r ( ) ; return r e s u l t ; } double a l p h a = Double . NEGATIVE INFINITY ; double b e t a = Double . POSITIVE INFINITY ; i f ( getGame ( ) . g e t C u r r e n t P l a y e r ( ) == getGame ( ) . g e t O t h e r P l a y e r ( ) ) { // This i s a min−node i f ( ! playedFreeMove ) { Move [ ] f o r c e d M o v e s = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( forcedMoves . length > 0) { // There a r e f o r c e d s p a c e s l e f t t o f i l l . f o r ( i n t i = 0 ; i < f o r c e d M o v e s . l e n g t h ; i ++) { printMove ( f o r c e d M o v e s [ i ] , depth ) ; playMove ( f o r c e d M o v e s [ i ] ) ; increaseDepth ( ) ; r e s u l t = a l p h a b e t a ( depth + 1 , a , Math . min ( b , b e t a ) , f a l s e ) ; decreaseDepth ( ) ;
81
Chapter A: Source code of the search algorithms
undoMove ( f o r c e d M o v e s [ i ] ) ; forcedMoves [ i ] . s e t S c o r e ( r e s u l t ) ; printMove ( f o r c e d M o v e s [ i ] , depth ) ; b e t a = Math . min ( r e s u l t , b e t a ) ; i f ( a >= b e t a | | i == f o r c e d M o v e s . l e n g t h − 1 ) { p r i n t C u t O f f ( depth ) ; return b e t a ; } } } else { //No f o r c e d s p a c e s l e f t . Move [ ] f r e e M o v e s = getMoveGenerator ( ) . getFreeMoves ( getGame ( ) ) ; f o r ( i n t i = 0 ; i < f r e e M o v e s . l e n g t h ; i ++) { printMove ( f r e e M o v e s [ i ] , depth ) ; playMove ( f r e e M o v e s [ i ] ) ; r e s u l t = a l p h a b e t a ( depth + 1 , a , Math . min ( b , b e t a ) , true ) ; undoMove ( f r e e M o v e s [ i ] ) ; freeMoves [ i ] . s e t S c o r e ( r e s u l t ) ; printMove ( f r e e M o v e s [ i ] , depth ) ; b e t a = Math . min ( r e s u l t , b e t a ) ; i f ( a >= b e t a | | i == f r e e M o v e s . l e n g t h − 1 ) { p r i n t C u t O f f ( depth ) ; return b e t a ; } } } } else { // Free move a l r e a d y p l a y e d . Try t h e f o r c e d s p a c e s . Move [ ] f o r c e d M o v e s = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( forcedMoves . length > 0) { f o r ( i n t i = 0 ; i < f o r c e d M o v e s . l e n g t h ; i ++) { printMove ( f o r c e d M o v e s [ i ] , depth ) ; playMove ( f o r c e d M o v e s [ i ] ) ; increaseDepth ( ) ; r e s u l t = a l p h a b e t a ( depth + 1 , a , Math . min ( b , b e t a ) , true ) ; decreaseDepth ( ) ; undoMove ( f o r c e d M o v e s [ i ] ) ; forcedMoves [ i ] . s e t S c o r e ( r e s u l t ) ; printMove ( f o r c e d M o v e s [ i ] , depth ) ; b e t a = Math . min ( r e s u l t , b e t a ) ; i f ( a >= b e t a | | i == f o r c e d M o v e s . l e n g t h − 1 ) { p r i n t C u t O f f ( depth ) ; return b e t a ; } } } else { //Turn f o r t h e o t h e r p l a y e r . getGame ( ) . n e x t P l a y e r ( ) ; r e s u l t = a l p h a b e t a ( depth , a , b , f a l s e ) ; getGame ( ) . p r e v i o u s P l a y e r ( ) ; return r e s u l t ; } } } i f ( getGame ( ) . g e t C u r r e n t P l a y e r ( ) == getGame ( ) . g e t R o b o t P l a y e r ( ) ) { // This i s a max−node i f ( ! playedFreeMove ) { // Not y e t p l a y e d a f r e e move Move [ ] f o r c e d M o v e s = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( forcedMoves . length > 0) { // There a r e s t i l l f o r c e d moves l e f t . f o r ( i n t i = 0 ; i < f o r c e d M o v e s . l e n g t h ; i ++) { printMove ( f o r c e d M o v e s [ i ] , depth ) ; playMove ( f o r c e d M o v e s [ i ] ) ; increaseDepth ( ) ;
82
Chapter A: Source code of the search algorithms
r e s u l t = a l p h a b e t a ( depth + 1 , Math . max( a , a l p h a ) , b , f a l s e ) ; decreaseDepth ( ) ; undoMove ( f o r c e d M o v e s [ i ] ) ; forcedMoves [ i ] . s e t S c o r e ( r e s u l t ) ; printMove ( f o r c e d M o v e s [ i ] , depth ) ; a l p h a = Math . max( r e s u l t , a l p h a ) ; i f ( a l p h a >= b | | i == f o r c e d M o v e s . l e n g t h − 1 ) { p r i n t C u t O f f ( depth ) ; return a l p h a ; } } } else { // Play t h e f r e e move . Move [ ] f r e e M o v e s = getMoveGenerator ( ) . getFreeMoves ( getGame ( ) ) ; f o r ( i n t i = 0 ; i < f r e e M o v e s . l e n g t h ; i ++) { printMove ( f r e e M o v e s [ i ] , depth ) ; playMove ( f r e e M o v e s [ i ] ) ; r e s u l t = a l p h a b e t a ( depth + 1 , Math . max( alpha , a ) , b , true ) ; undoMove ( f r e e M o v e s [ i ] ) ; f r e e M o v e s [ i ] . s e t S c o r e ( depth ) ; printMove ( f r e e M o v e s [ i ] , depth ) ; a l p h a = Math . max( r e s u l t , a l p h a ) ; i f ( a l p h a >= b | | i == f r e e M o v e s . l e n g t h − 1 ) { p r i n t C u t O f f ( depth ) ; return a l p h a ; } } } } else { //A f r e e move was a l r e a d y p l a y e d . Move [ ] f o r c e d M o v e s = getMoveGenerator ( ) . getForcedMoves ( getGame ( ) ) ; i f ( forcedMoves . length > 0) { // There a r e f o r c e d moves l e f t . f o r ( i n t i = 0 ; i < f o r c e d M o v e s . l e n g t h ; i ++) { printMove ( f o r c e d M o v e s [ i ] , depth ) ; playMove ( f o r c e d M o v e s [ i ] ) ; increaseDepth ( ) ; r e s u l t = a l p h a b e t a ( depth + 1 , Math . max( alpha , a ) , b , true ) ; decreaseDepth ( ) ; undoMove ( f o r c e d M o v e s [ i ] ) ; forcedMoves [ i ] . s e t S c o r e ( r e s u l t ) ; printMove ( f o r c e d M o v e s [ i ] , depth ) ; a l p h a = Math . max( r e s u l t , a l p h a ) ; i f ( a l p h a >= b | | i == f o r c e d M o v e s . l e n g t h − 1 ) { p r i n t C u t O f f ( depth ) ; return a l p h a ; } } } else { //No f o r c e d s p a c e s l e f t , n e x t p l a y e r ’ s t u r n now . getGame ( ) . n e x t P l a y e r ( ) ; r e s u l t = a l p h a b e t a ( depth , a , b , f a l s e ) ; getGame ( ) . p r e v i o u s P l a y e r ( ) ; return r e s u l t ; } } } System . e r r . p r i n t l n ( ” Warning : This s h o u l d n ’ t happen ” ) ; return Double . NEGATIVE INFINITY ; }
83
Appendix B Source code of the heuristic algorithms B.1 B.1.1
Heuristic algorithms Measuring real line length
private void c o u n t L i n e s (Game game ) { boolean [ ] [ ] c h e c k e d = new boolean [ 6 7 ] [ 3 6 ] ; GameBoard gb = game . getGameBoard ( ) ; i n t x , y , xb , yb , d , x l , y l ; Line l i n e ; Color c o l o r ; Tile t i l e ; f o r ( i n t p l a y e r = 0 ; p l a y e r < game . g e t P l a y e r s ( ) . l e n g t h ; p l a y e r++) { i f ( game . g e t P l a y e r ( p l a y e r ) == n u l l ) { continue ; } c o l o r = game . g e t P l a y e r ( p l a y e r ) . g e t C o l o r ( ) ; $ l i n e s [ p l a y e r ] = new V e c t o r ( ) ; clearChecked ( checked ) ; // Check f o r open l i n e s . f o r ( i n t o u t l i n e = 0 ; o u t l i n e < gb . g e t O u t L i n e ( ) . l e n g t h ; o u t l i n e ++) { x = gb . g e t O u t Li n e ( ) [ o u t l i n e ] [ 0 ] ; y = gb . g e t O u t Li n e ( ) [ o u t l i n e ] [ 1 ] ; f o r ( i n t d i r e c t i o n = 0 ; d i r e c t i o n < 6 ; d i r e c t i o n ++) { xb = x + GameBoard .DX[ d i r e c t i o n ] ; yb = y + GameBoard .DY[ d i r e c t i o n ] ; i f ( gb . h a s T i l e ( xb , yb ) && gb . g e t T i l e ( xb , yb ) . getColorOfDirection ( d i r e c t i o n + 3) . e q u a l s ( c o l o r ) && ! c h e c k e d [ xb ] [ yb ] ) { c h e c k e d [ xb ] [ yb ] = true ; d = ( d i r e c t i o n + 3) % 6 ; l i n e = new L i n e ( c o l o r , xb , yb , d ) ; l i n e . increaseLength ( ) ; t i l e = gb . g e t T i l e ( xb , yb ) ; d = t i l e . getOtherSide (d ) ; x l = xb + GameBoard .DX[ d ] ; y l = yb + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; while ( gb . h a s T i l e ( x l , y l ) ) {
84
Chapter B: Source code of the heuristic algorithms
c h e c k e d [ x l ] [ y l ] = true ; l i n e . increaseLength ( ) ; t i l e = gb . g e t T i l e ( x l , y l ) ; d = t i l e . getOtherSide (d ) ; x l = x l + GameBoard .DX[ d ] ; y l = y l + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; } x l = x l + GameBoard .DX[ d ] ; y l = y l + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; l i n e . c l o s e L i n e ( xl , yl , d ) ; l i n e . setRealLineLengthScore ( l i n e . getLength ( ) ) ; $ l i n e s [ p l a y e r ] . add ( l i n e ) ; } } } // Check f o r l o o p s . f o r ( x = 0 ; x < c h e c k e d . l e n g t h ; x++) { f o r ( y = 0 ; y < c h e c k e d . l e n g t h ; y++) { t i l e = gb . g e t T i l e ( x , y ) ; i f ( t i l e != n u l l && ! c h e c k e d [ x ] [ y ] ) { c h e c k e d [ x ] [ y ] = true ; i f ( t i l e . hasColor ( c o l o r ) ) { d = t i l e . g e t D i r e c t i o n s O f ( game . g e t P l a y e r ( p l a y e r ) . getColor ( ) ) [ 0 ] ; x l = x + GameBoard .DX[ d ] ; y l = y + GameBoard .DY[ d ] ; l i n e = new L i n e ( 1 ) ; d = (d + 3) % 6 ; while ( ! ( x l == x && y l == y ) ) { c h e c k e d [ x l ] [ y l ] = true ; l i n e . increaseLength ( ) ; t i l e = game . getGameBoard ( ) . g e t T i l e ( x l , y l ) ; d = t i l e . getOtherSide (d ) ; x l += GameBoard .DX[ d ] ; y l += GameBoard .DY[ d ] ; d = (d + 3) % 6 ; } l i n e . s e t L o o p ( true ) ; line . setColor ( color ) ; l i n e . setRealLineLengthScore (2 ∗ l i n e . getLength ( ) ) ; $ l i n e s [ p l a y e r ] . add ( l i n e ) ; } } } } } }
B.1.2
Measuring virtual line length
private void c o u n t V i r t u a l L i n e s (Game game ) { boolean [ ] [ ] c h e c k e d = new boolean [ 6 7 ] [ 3 6 ] ; GameBoard gb = game . getGameBoard ( ) ; i n t x , y , xb , yb , d , x l , y l ; Line l i n e ; Color c o l o r ; Tile t i l e ; boolean isGap ; f o r ( i n t p l a y e r = 0 ; p l a y e r < game . g e t P l a y e r s ( ) . l e n g t h ; p l a y e r++) { i f ( game . g e t P l a y e r ( p l a y e r ) == n u l l ) { continue ; }
85
Chapter B: Source code of the heuristic algorithms
c o l o r = game . g e t P l a y e r ( p l a y e r ) . g e t C o l o r ( ) ; $ v i r t u a l L i n e s [ p l a y e r ] = new V e c t o r ( ) ; clearChecked ( checked ) ; // Check f o r open l i n e s . f o r ( i n t o u t l i n e = 0 ; o u t l i n e < gb . g e t O u t L i n e ( ) . l e n g t h ; o u t l i n e ++) { x = gb . g e t O u t Li n e ( ) [ o u t l i n e ] [ 0 ] ; y = gb . g e t O u t Li n e ( ) [ o u t l i n e ] [ 1 ] ; i f ( i s S t a r t O f V i r t u a l L i n e ( game , c o l o r , x , y ) ) { f o r ( i n t d i r e c t i o n = 0 ; d i r e c t i o n < 6 ; d i r e c t i o n ++) { xb = x + GameBoard .DX[ d i r e c t i o n ] ; yb = y + GameBoard .DY[ d i r e c t i o n ] ; i f ( gb . h a s T i l e ( xb , yb ) && gb . g e t T i l e ( xb , yb ) . g e t C o l o r O f D i r e c t i o n ( direction + 3). equals ( color ) && ! c h e c k e d [ xb ] [ yb ] ) { c h e c k e d [ xb ] [ yb ] = true ; d = ( d i r e c t i o n + 3) % 6 ; l i n e = new L i n e ( c o l o r , xb , yb , d ) ; l i n e . increaseLength ( ) ; t i l e = gb . g e t T i l e ( xb , yb ) ; d = t i l e . getOtherSide (d ) ; x l = xb + GameBoard .DX[ d ] ; y l = yb + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; isGap = isGap ( gb , c o l o r , x l , y l ) ; while ( gb . h a s T i l e ( x l , y l ) | | isGap ) { c h e c k e d [ x l ] [ y l ] = true ; l i n e . increaseLength ( ) ; i f ( isGap ) { i f ( c a n B e F i l l e d ( game , x l , y l ) ) { l i n e . addGap ( x l , y l ) ; d = g e t O t h e r D i r e c t i o n O f G a p ( gb , c o l o r , x l , yl , d ) ; } else { x l = x l + GameBoard .DX[ d ] ; y l = y l + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; l i n e . c l o s e L i n e ( xl , yl , d ) ; l i n e . decreaseLength ( ) ; l i n e . setVirtualLineLengthScore ( l i n e . getLength ( ) − 0 . 3 ∗ l i n e . getNbGaps ( ) ) ; $ v i r t u a l L i n e s [ p l a y e r ] . add ( l i n e ) ; x l = x l + GameBoard .DX[ d ] ; y l = y l + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; d = g e t O t h e r D i r e c t i o n O f G a p ( gb , c o l o r , x l , yl , d ) ; x l += GameBoard .DX[ d ] ; y l += GameBoard .DY[ d ] ; d = (d + 3) % 6 ; l i n e = new L i n e ( c o l o r , x l , y l , d ) ; l i n e . increaseLength ( ) ; t i l e = gb . g e t T i l e ( x l , y l ) ; c h e c k e d [ x l ] [ y l ] = true ; d = t i l e . getOtherSide (d ) ; } } else { t i l e = gb . g e t T i l e ( x l , y l ) ; d = t i l e . getOtherSide (d ) ; } x l = x l + GameBoard .DX[ d ] ; y l = y l + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; isGap = isGap ( gb , c o l o r , x l , y l ) ;
86
Chapter B: Source code of the heuristic algorithms
} x l = x l + GameBoard .DX[ d ] ; y l = y l + GameBoard .DY[ d ] ; d = (d + 3) % 6 ; l i n e . c l o s e L i n e ( xl , yl , d ) ; l i n e . setVirtualLineLengthScore ( l i n e . getLength ( ) − 0 . 3 ∗ l i n e . getNbGaps ( ) ) ; $ v i r t u a l L i n e s [ p l a y e r ] . add ( l i n e ) ; } } } } // Check f o r l o o p s . f o r ( x = 0 ; x < c h e c k e d . l e n g t h ; x++) { f o r ( y = 0 ; y < c h e c k e d . l e n g t h ; y++) { t i l e = gb . g e t T i l e ( x , y ) ; i f ( t i l e != n u l l && ! c h e c k e d [ x ] [ y ] ) { c h e c k e d [ x ] [ y ] = true ; i f ( t i l e . hasColor ( c o l o r ) ) { d = t i l e . g e t D i r e c t i o n s O f ( game . g e t P l a y e r ( p l a y e r ) . getColor ( ) ) [ 0 ] ; x l = x + GameBoard .DX[ d ] ; y l = y + GameBoard .DY[ d ] ; boolean mightBeClosed = true ; l i n e = new L i n e ( 1 ) ; d = (d + 3) % 6 ; while ( ! ( x l == x && y l == y ) ) { c h e c k e d [ x l ] [ y l ] = true ; l i n e . increaseLength ( ) ; t i l e = game . getGameBoard ( ) . g e t T i l e ( x l , y l ) ; i f ( t i l e == n u l l ) { l i n e . addGap ( x l , y l ) ; mightBeClosed = c a n B e F i l l e d ( game , x l , y l ) ; d = g e t O t h e r D i r e c t i o n O f G a p ( gb , c o l o r , x l , y l , d ) ; } else { d = t i l e . getOtherSide (d ) ; } x l += GameBoard .DX[ d ] ; y l += GameBoard .DY[ d ] ; d = (d + 3) % 6 ; } l i n e . s e t L o o p ( true ) ; line . setColor ( color ) ; i f ( mightBeClosed ) { l i n e . setVirtualLineLengthScore (2 ∗ l i n e . getLength ( ) − 0 . 6 ∗ l i n e . getNbGaps ( ) ) ; } else { l i n e . setVirtualLineLengthScore ( l i n e . getLength ( ) − 0 . 3 ∗ l i n e . getNbGaps ( ) ) ; } $ v i r t u a l L i n e s [ p l a y e r ] . add ( l i n e ) ; } } } } } }
B.1.3
Measuring fair share line length
/∗ ∗ ∗ @pre : This method i s c a l l e d a f t e r c o u n t V i r t u a l L i n e s ∗/ private void c o u n t F a i r S h a r e L i n e s (Game game ) {
87
Chapter B: Source code of the heuristic algorithms
Color c o l o r ; Line l i n e ; GameBoard gb = game . getGameBoard ( ) ; double f s s c o r e ; double NB, NT; double TB, RT, n b F i t t i n g T i l e s , n b C o l o r M a t c h i n g T i l e s ; int x , y ; int [ ] [ ] endpoints ; f o r ( i n t p l a y e r = 0 ; p l a y e r < game . g e t P l a y e r s ( ) . l e n g t h ; p l a y e r++) { i f ( game . g e t P l a y e r ( p l a y e r ) == n u l l ) { continue ; } c o l o r = game . g e t P l a y e r ( p l a y e r ) . g e t C o l o r ( ) ; NT = g e t N b R e m a i n i n g T i l e s O f C o l o r ( game , c o l o r ) ; TB = game . g e t T i l e B a g ( ) . g e t N b R e m a i n i n g T i l e s ( ) ; RT = 56 − gb . getNumberOfPl a ye dT i l es ( ) ; f o r ( i n t i = 0 ; i < $ v i r t u a l L i n e s [ p l a y e r ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ v i r t u a l L i n e s [ p l a y e r ] . elementAt ( i ) ; i f ( l i n e . isLoop ( ) ) { l i n e . setFairShareScore ( l i n e . getVirtualLineLenghtScore ( ) ) ; continue ; } f s s c o r e = l i n e . getLength ( ) ; NT −= l i n e . getNbGaps ( ) ; endpoints = l i n e . getEndPoints ( ) ; f o r ( i n t ep = 0 ; ep < e n d p o i n t s . l e n g t h ; ep++) { x = e n d p o i n t s [ ep ] [ 0 ] ; y = e n d p o i n t s [ ep ] [ 1 ] ; i f ( gb . i s F o r c e d ( x , y ) | | gb . i n F r e e U n f o r c e d L i s t ( x , y ) ) { i f ( t i l e F r o m R a c k F i t s ( game , x , y ) ) { f s s c o r e += NT / 4D; } else { n b F i t t i n g T i l e s = g e t N b F i t t i n g T i l e s F r o m B a g ( game , x , y ) ; i f ( n b F i t t i n g T i l e s > 0) { f s s c o r e += (NT / 4D) − (TB / ( n b F i t t i n g T i l e s + 1D) ) ∗ (NT / RT) ; } else { n b C o l o r M a t c h i n g T i l e s = g e t N b C o l o r M a t c h i n g T i l e s ( game , x, y); i f ( nbColorMatchingTiles > 0) { f s s c o r e += Math . min ( 3D, n b C o l o r M a t c h i n g T i l e s / 3D ) ; } } } } else { i n t f x = gb . g e t O r i g i n F o r c e d S p a c e ( x , y ) [ 0 ] ; i n t f y = gb . g e t O r i g i n F o r c e d S p a c e ( x , y ) [ 1 ] ; i n t ba = gb . g e t F o r c e d S p a c e D i s t a n c e ( x , y ) ; n b F i t t i n g T i l e s = g e t N b F i t t i n g T i l e s F r o m B a g ( game , fx , f y ) ; i f ( n b F i t t i n g T i l e s > 0) { f s s c o r e += (NT / 4D) − (TB / ( n b F i t t i n g T i l e s + 1D) ) ∗ (NT / RT) − ba ; } else { n b C o l o r M a t c h i n g T i l e s = g e t N b C o l o r M a t c h i n g T i l e s ( game , x , y); i f ( nbColorMatchingTiles > 0) { f s s c o r e += Math . min ( 3D, n b C o l o r M a t c h i n g T i l e s / 3D ) ; } } } } line . setFairShareScore ( fsscore ) ; NT += l i n e . getNbGaps ( ) ; }
88
Chapter B: Source code of the heuristic algorithms
} }
B.1.4
Combination of heuristics
Startgame score /∗ ∗ ∗ V e r i f y t h a t l i n e s a r e c o u n t e d b e f o r e c a l l i n g t h i s method ∗/ private double getBeginGameScore (Game game ) { i n t r o b o t = game . g e t R o b o t P l a y e r ( ) . getColorNumber ( ) ; Line l i n e ; double f s s , v l s ; double maxRobot = 0 ; i f ( $ v i r t u a l L i n e s [ r o b o t ] . s i z e ( ) == 0 ) { maxRobot = g e t B e g i n S c o r e ( 0 , 2 1 ) ; } f o r ( i n t i = 0 ; i < $ v i r t u a l L i n e s [ r o b o t ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ v i r t u a l L i n e s [ r o b o t ] . elementAt ( i ) ; vls = l i n e . getVirtualLineLenghtScore ( ) ; f s s = l i n e . getFairShareScore ( ) ; i f ( l i n e . i s L o o p ( ) && l i n e . g e t V i r t u a l L i n e L e n g h t S c o r e ( ) < 1 4 ) { v l s ∗= −10; } maxRobot = Math . max( maxRobot , g e t B e g i n S c o r e ( v l s , f s s ) ) ; } i n t o t h e r = game . g e t O t h e r P l a y e r ( ) . getColorNumber ( ) ; double maxOther = 0 ; i f ( $ v i r t u a l L i n e s [ o t h e r ] . s i z e ( ) == 0 ) { maxOther = g e t B e g i n S c o r e ( 0 , 2 1 ) ; } f o r ( i n t i = 0 ; i < $ v i r t u a l L i n e s [ o t h e r ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ v i r t u a l L i n e s [ o t h e r ] . elementAt ( i ) ; vls = l i n e . getVirtualLineLenghtScore ( ) ; f s s = l i n e . getFairShareScore ( ) ; i f ( l i n e . i s L o o p ( ) && l i n e . g e t V i r t u a l L i n e L e n g h t S c o r e ( ) < 1 4 ) { v l s ∗= −10; } maxOther = Math . max( maxOther , g e t B e g i n S c o r e ( v l s , f s s ) ) ; } return maxRobot − maxOther ; } private double g e t B e g i n S c o r e ( double v l s , double f s s ) { return 6 ∗ v l s + f s s ; }
Endgame score /∗ ∗ ∗ V e r i f y t h a t l i n e s a r e c o u n t e d b e f o r e c a l l i n g t h i s method . ∗/ private double getEndGameScore (Game game ) { i n t r o b o t = game . g e t R o b o t P l a y e r ( ) . getColorNumber ( ) ; Line l i n e , v l i n e ; double maxRobotrls = 0 , maxRobotfss = 0 ; f o r ( i n t i = 0 ; i < $ l i n e s [ r o b o t ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ l i n e s [ r o b o t ] . elementAt ( i ) ; // Loops a r e n o t c o u n t e d anymore , t h i s method s h o u l d o n l y be used // f o r end games .
89
Chapter B: Source code of the heuristic algorithms
i f ( l i n e . g e t R e a l L i n e L e n g t h S c o r e ( ) > maxRobotrls ) { maxRobotrls = l i n e . g e t R e a l L i n e L e n g t h S c o r e ( ) ; } } f o r ( i n t i = 0 ; i < $ v i r t u a l L i n e s [ r o b o t ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ v i r t u a l L i n e s [ r o b o t ] . elementAt ( i ) ; i f ( l i n e . g e t F a i r S h a r e S c o r e ( ) > maxRobotfss ) { maxRobotfss = l i n e . g e t F a i r S h a r e S c o r e ( ) ; } } i n t o t h e r = game . g e t O t h e r P l a y e r ( ) . getColorNumber ( ) ; double maxOtherrls = 0 , maxOtherfss = 0 ; f o r ( i n t i = 0 ; i < $ l i n e s [ o t h e r ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ l i n e s [ o t h e r ] . elementAt ( i ) ; // Loops a r e n o t c o u n t e d anymore , t h i s method s h o u l d o n l y be used // f o r end games . i f ( l i n e . g e t R e a l L i n e L e n g t h S c o r e ( ) > m axO th err ls ) { maxOtherrls = l i n e . g e t R e a l L i n e L e n g t h S c o r e ( ) ; } } f o r ( i n t i = 0 ; i < $ v i r t u a l L i n e s [ o t h e r ] . s i z e ( ) ; i ++) { l i n e = ( L i n e ) $ v i r t u a l L i n e s [ o t h e r ] . elementAt ( i ) ; i f ( l i n e . g e t F a i r S h a r e S c o r e ( ) > maxOtherfss ) { maxOtherfss = l i n e . g e t F a i r S h a r e S c o r e ( ) ; } } return g e t E n d S c o r e ( maxRobotrls , maxRobotfss ) − g e t E n d S c o r e ( maxOtherrls , maxOtherfss ) ; } private double g e t E n d S c o r e ( double r l s , double f s s ) { return 6 ∗ r l s + f s s ; }
Global score public double g e t S c o r e (Game game ) { i n t n b t i l e s = game . getGameBoard ( ) . ge t Nu m b er O fPl ay ed T i le s ( ) ; i f ( n b t i l e s <= 3 5 ) { c o u n t V i r t u a l L i n e s ( game ) ; c o u n t F a i r S h a r e L i n e s ( game ) ; return getBeginGameScore ( game ) ; } c o u n t L i n e s ( game ) ; c o u n t V i r t u a l L i n e s ( game ) ; c o u n t F a i r S h a r e L i n e s ( game ) ; i f ( n b t i l e s >= 4 4 ) { return getEndGameScore ( game ) ; } return ( ( double ) ( n b t i l e s − 3 5 ) / 11D) ∗ getEndGameScore ( game ) + ( ( double ) ( 4 4 − n b t i l e s ) / 11D) ∗ getBeginGameScore ( game ) ; }
90
Appendix C Analysis of some games Two games between Oliver and Rob Morton were played to analyse the performance of Oliver. Both games were won by Rob Morton. A summary of the comments on the game are given below. The games themselves can be reviewed online with the reviewer from Tantrix.com. For these two games, Rob was asked to play as good as possible.
C.1
A first game
This game was saved as U!cheekymbk-23-Oliver-18-2005-05-09-1752. • Move 6: Rob has set up the game so he can waste green tiles on the left on the BR link. This way, Oliver will have 2 separate lines. • Move 8: Oliver makes a strange choice, considering his fifth tile • Move 16: Oliver makes a block here, but he should have diverted the yellow line to the other side. Three RBB-tiles are gone and Oliver is holding the fourth tile. • Move 17: Rob is trying to waste as many GGs as possible. • Move 19: The choice in move 16 results in a bad game situation • Move 21: The GGB on top is very likely to join as there are 4 left. There are no points for making lookalikes and a pair block can be made either. The best choice for Rob is to attack. • Move 25: Oliver adds two tiles to the longest line of Rob in the hope this will also add two tiles to his longest line. The line of Rob is already certain, while the connection for Oliver is not certain yet, as they are indirectly connected. This was a bad choice by Oliver. 91
Chapter C: Analysis of some games
92
• Move 26: The third tile on the right lower corner is a better choice for Oliver. It forces Rob into a short loop and Oliver adds indirectly to his line. • Move 28: Rob made a good block for the line of Oliver. It was worth the risk, but it did not work. • Move 39: Rob is wasting green here and makes a lookalike space for the GRG tile. This wastes the crucial tile to connect at the lower left side of the Tantrix and gives the chance to Rob to join on the right. • Move 40: The BBRGRG-tile is the last GRG for a bottom join for Oliver • Move 43: Oliver wastes the YY corners. Very good move. • Move 49: First move of Oliver after the tile bag is empty. Oliver blocks Rob. He could not add to his line and wasted both YY tiles. • Move 50: Rob can not waste both tiles of Oliver. The key move in this game is move 16.
C.2
A second game
This game was saved as U!Oliver-15-cheekymbk-25-2005-05-09-1826 • Move 3: Oliver should have prevented the bottom red from getting shortlooped. • Move 8: Oliver will have 2 separate lines after the move of Rob. • Move 13: Rob can easily add two tiles on the top, while wasting red tiles. • Move 21: Here Rob fully exploits the mistake made in move 3. • Move 26: Oliver chooses for an indirect extension of his line by two tiles, while Rob can extend his line better. In this situation there was not a really good move. • Move 39: Serious loop threat. A tile with a straight yellow line over blue and red is still available in the bag. • Move 46: Rob did not play the best move here. • Move 49: Oliver expoited the mistake from Rob, blocking Rob’s line permanently. • Move 51: Oliver should have played the first tile in the upper right corner of the Tantrix . In this way, Rob can not extend his line in the next move and waste the last red tile from yellow. Key move in this game is move 3.
Bijlage D Nederlandse samenvatting D.1
Inleiding
Sinds het onderzoek op het vlak van artifici¨ele intelligentie opbloeide in de jaren vijftig, is veel onderzoek verricht rond bordspelen. De reden hiervoor is dat de meeste bordspelen een goed afgebakende set van regels hebben. Het is dan ook gemakkelijk om situaties en zetten voor te stellen en er is geen kennis over de rest van de wereld vereist. Verder hebben bordspelen interessante zoekruimtes. Deze ruimtes zijn groot en complex zodat aangepaste zoekalgoritmes nodig zijn. Heuristieken zijn hierbij zeer belangrijk. Op dit vlak zijn al een aantal successen geboekt. Voor de spelletjes dammen en othello haalt de computer het overtuigend van de beste spelers. Voor schaken en backgammon is de computer ook beter, maar met een minder overtuigend verschil. De programma’s voor othello en backgammon maken gebruik van automatische leertechnieken. Ondanks de successen zijn er nog veel problemen. Ieder computerprogramma dat go speelt, kan gemakkelijk verslagen worden door spelers met een gemiddeld niveau. Tantrix is een strategisch bordspel, dat in 1992 voor het eerst in zijn huidige vorm verscheen. Dit spel werd uitgevonden door Mike McManaway afkomstig van Nieuw-Zeeland. Bij dit spel komt ook geluk kijken. In 1997 werden de eerste robots ontwikkeld. Deze robots konden snel spelen, maar ook gemakkelijk verslagen worden. De doelstelling van deze thesis is het maken van een nieuwe robot die Tantrix op het ‘master’ niveau speelt. In hoofdstuk 1 wordt een overzicht gegeven van de geschiedenis van Tantrix . Verder worden ook de spelregels uitgelegd. Hoofdstuk 2 geeft een overzicht van de bestaande robots en weidt verder uit over de strategie¨en die tot nu toe beschreven zijn om Tantrix te spelen. In hoofdstuk 3 wordt uitgewerkt hoe minimax en alfabeta werden ge¨ımplementeerd. Verder worden ook de gebruikte heuristieken uitgelegd en hoe ze gecombineerd worden. In hoofdstuk 4 wordt toegelicht hoe de algoritmes werden ge¨ımplementeerd. De resultaten 93
Chapter D: Nederlandse samenvatting
94
van de implementaties worden besproken in hoofdstuk 5. Vervolgens worden in hoofdstuk 6 enkele suggesties gemaakt voor verdere verbeteringen. Ten slotte eindigt de tekst met een besluit.
D.2 D.2.1
Beschrijving van het spel De Tantrixtegels
Tantrix bestaat uit een set van 56 verschillende zeshoekige tegels. Elke tegel heeft een zwarte achtergrond met daarop lijnen in drie kleuren erop. Figuur 1.2 toont alle 56 verschillende tegels. Iedere tegel is uniek. In totaal zijn er vier kleuren, namelijk geel, rood, groen en blauw. Aangezien er 56 tegels zijn, hebben 42 van hen een blauwe lijn, 42 een rode lijn, 42 een groene lijn en 42 een blauwe lijn. Er zijn vier mogelijke vormen van lijnen op een tegel. De mogelijkheden zijn weergegeven in figuur 1.3. De tegels van Tantrix kunnen vergeleken worden met de kaarten van een kaartspel. Het is in beide spelen zeer belangrijk om te weten wat al gespeeld is, wat nog beschikbaar is en wat bij de verschillende spelers aanwezig is.
D.2.2
Doel van het spel
Tantrix wordt typisch gespeeld met twee spelers, maar kan ook met drie of vier gespeeld worden. Elke speler kiest een kleur: geel, rood, groen of blauw. Dit zijn ook de kleuren van de lijnen die op de tegels voorkomen. Daarna neemt iedere speler een willekeurige tegel uit de zak. Alle tegels zijn genummerd. De speler met de tegel met het hoogste nummer zal mogen beginnen. Daarna neemt iedere speler nog vijf extra tegels uit de zak. De rest van de tegels blijft verborgen in de zak. Iedere speler moet ook zijn tegels zichtbaar maken voor de tegenstander. Het doel van het spel is een lijn of lus te maken die het meeste punten krijgt. Enkel de lijn of lus met het meeste aantal punten wordt in rekening gebracht. De punten worden aan een lijn of lus toegekend door voor iedere tegel op een lijn ´e´en punt en voor iedere tegel op een lus twee punten te rekenen. De eerste speler begint door ´e´en van zijn zes tegels in het midden van de tafel te leggen. Daarna neemt hij een nieuwe willekeurige tegel uit de zak. Daarna kan de speler aan zijn linkerzijde een tegel spelen en daarna een willekeurige tegel uit de zak nemen. De totale bedenktijd voor spelletjes tijdens toernooien is beperkt tot vijftien minuten.
Chapter D: Nederlandse samenvatting
D.2.3
95
De spelregels
Een beurt Een beurt in Tantrix bestaat typisch uit drie opeenvolgende acties. Eerst moeten alle geforceerde ruimtes opgevuld worden die kunnen gevuld worden. Vervolgens mag een vrije zet gespeeld worden. Daarna moeten opnieuw alle mogelijke geforceerde ruimtes opgevuld worden. Een geforceerde ruimte ontstaat wanneer een ruimte ingesloten is door minstens drie tegels. Als meerdere tegels in de geforceerde ruimte passen, dan mag een tegel naar keuze gespeeld worden. Als een tegel in twee of meer geforceerde ruimtes tegelijk past, dan mag er gekozen worden in welke geforceerde ruimte die gespeeld wordt. Regels en beperkingen om een tegel te spelen De gouden regel in Tantrix zegt dat bij het spelen van een tegel de kleuren van de aangrenzende tegels moeten overeenstemmen. Dit is ge¨ıllustreerd in figuur 1.5. Verder zijn er nog drie regels die geldig blijven tot er geen tegels meer in de zak zitten. Een eerste regel, ge¨ıllustreerd in figuur 1.6 zegt dat er geen geforceerde ruimte mag gecre¨eerd worden die ingesloten is door drie dezelfde kleuren. Hierin zou namelijk geen enkele tegel passen. Verder is het niet toegestaan om een lege plaats op het bord in te sluiten met vier tegels. Deze regel is ge¨ıllustreerd in figuur 1.7. Verder is het ook niet toegestaan om langs een gecontrolleerde zijde te spelen. Dit wordt uitgelegd in figuur 1.8. Figuur 1.9 toont zo’n gecontrolleerde zijde. Ruimte A moet opgevuld worden voor ruimte B, ruimte B voor C en ruimte C voor D. In het eindspel, dit is de fase van het spel wanneer er geen tegels meer in de zak zitten, worden de restricties, op de gouden regel na, opgeheven. Wanneer alle tegels op het bord geplaatst zijn, is het spel voorbij. Een voorbeeld van een be¨eindigd spel is weergegeven in figuur 1.10. In figuur 1.10 speelde een speler met kleur blauw tegen een speler met kleur rood. Hier heeft ‘blauw’ gewonnen. ‘Rood’ heeft een lijn van 22 tegels lang, en geen enkele lus. Alle andere lijnen van ‘rood’ zijn korter dan 22 tegels en krijgen bijgevolg minder punten. De eindscore brengt enkel de langste lijn in rekening. ‘Rood’ krijgt dus een eindscore van 22 punten. ‘Blauw’ heeft een lus van 22 tegels. Deze lus is 44 punten waard. Geen enkele andere lijn krijgt zoveel punten. De totale eindscore voor blauw is dan ook 44 punten.
Chapter D: Nederlandse samenvatting
D.3 D.3.1
96
Bestaande robots en menselijke strategie¨ en Bestaande robots
Dit werk is niet de eerste poging om een robot te maken voor het spelen van Tantrix . In 1997 programmeerde Richard Clarke twee robots, namelijk ‘Challenger’ en ‘Defender’. Deze twee robots werden commercieel verkocht, maar kunnen gemakkelijk verslaan worden door beginnende spelers. In datzelfde jaar programmeerde Dave Dyer ook nog drie robots, namelijk Robot3, Robot2 en Robot. Bij de start van het project was Robot de beste Tantrix robot. Als een reactie op dit project, programmeerde hij ook nog GoodBot. Deze robot speelt op gevorderd niveau, en heeft online in de lobby een ranking van ongeveer 840 punten. Hoe deze rankings werken staat beschreven in deel 5.1.3. Hoe robot juist werkt, staat kort beschreven in deel 2.1.2. De robots gebruiken geen alfabeta, maar naar alle waarschijnlijkheid minimax. De robots evalueren ook niet tot op grote diepte. Enkel de geforceerde antwoorden van de tegenstander worden in rekening gebracht.
D.3.2
Menselijke strategie¨ en
Sinds 1992 zijn er al heel wat spelletjes Tantrix gespeeld en daardoor zijn al een aantal algemene technieken gekend om het spel goed te spelen. Er zijn drie fasen in het spel. De eerste fase is het beginspel. Het beginspel omvat ongeveer de eerste twintig zetten. Daarna volgt het middenspel. Vanaf het moment dat de zak leeg is, spreekt men van het eindspel. Voor elk van deze fasen zijn er strategie¨en die vaak worden aangewend, maar ze mogen ook in andere fasen gebruikt worden. Beginspel • In het beginspel is het vooral belangrijk om indirecte verbindingen te maken, zoals weergegeven in figuur 2.1. • Aangezien er maar 42 tegels van een bepaalde kleur zijn, moet men zeker proberen om in het beginspel tegels van de tegenstander te verspillen in een kleine lus die hij niet meer kan uitbreiden en waarmee hij onmogelijk kan winnen. Dergelijke kleine lussen zijn ge¨ıllustreerd in de figuren 2.2 en 2.3. • Ook middelgrote lussen zoals er ´e´en getoond wordt in figuur 2.4 zijn vaak te klein om ermee te kunnen winnen. De gemiddelde eindscore in Tantrix is namelijk ongeveer 21 punten. • Dreigen met een grote lus is interessant, want dan moet de tegenstander verdedigen door te verhinderen dat de lus zich sluit. Dit kan hij doorgaans enkel doen door ´e´en
Chapter D: Nederlandse samenvatting
97
van de twee eindpunten te verlengen in een andere richting. Middenspel • Aangezien enkel de langste lijn telt, is het belangrijk om ´e´en lange lijn te maken. Hoe verder het spel is gevorderd, hoe moeilijker het wordt om twee afzonderlijke lijnen te verbinden. Een voorbeeld is te zien in figuur 2.5. • Het is belangrijk om de tegels die nog over zijn in de zak te kennen. Alle tegels die op een eindpunt passen, zijn misschien al gespeeld. Dan kan de lijn ook niet meer langs dat einde verlengd worden. Een dergelijke situatie staat in figuur 2.6. • Men kan ook een geforceerde ruimte maken die niet direct opgevuld kan worden. In dat geval maakt men een gecontrolleerde zijde waarlangs niemand mag spelen. Dit kan ´e´en of meerdere uiteinden van de tegenstander blokkeren. Er bestaan verschillende blokkades in Tantrix . De gewone manier van blokkeren is ge¨ıllustreerd in figuur 2.7. Als de twee uiteinden van een lijn langs dezelfde zijde van de Tantrix uitkomen, kunnen ze soms met ´e´en tegel geblokkeerd worden, zoals ge¨ıllustreerd in figuur 2.8. Het is ook mogelijk om een geforceerde ruimte te maken met kleurenparen langs een of meerdere kanten. Aangezien een geforceerde ruimte niet omringd mag worden door drie dezelfde kleuren, passen hier minder tegels in zolang de zak niet leeg is. Een voorbeeld van zo’n situatie is gegeven in figuur 2.9. • Het is ook mogelijk om een ‘lookalike’ ruimte te maken op het bord. Dit is een geforceerde ruimte waar een tegel in past die ook op een andere plaats op het bord past. Zo’n ruimte wordt typisch gemaakt door master spelers. Zoals getoond op figuur 2.10, maakt men zo’n ruimte het best wanneer er nog ´e´en tegel overblijft die strategisch belangrijk is voor de tegenstander. Als men dan het geluk heeft om die tegel zelf te verkrijgen, kan men ze in de ‘lookalike’ wegspelen. Eindspel In het eindspel is het vooral belangrijk om als eerste aan zet te zijn nadat de zak leeg is. Aangezien enkele beperkingen dan wegvallen, kan men dan gemakkelijk de tegenstander blokkeren. Een voorbeeld van een geblokkeerde lijn tijdens het eindspel is gegeven in figuur 2.11. Een ruimte cre¨eren omringd door drie dezelfde kleuren is anders niet toegestaan.
Chapter D: Nederlandse samenvatting
D.4
98
Algoritmes en heuristieken
D.4.1
De spelboom van Tantrix
Om te redeneren over spelletjes wordt doorgaans gebruik gemaakt van een spelboom. Een voorbeeld van een spelboom voor Tantrix is gegeven in figuur 3.1. De knopen in de boom stellen spelsituaties voor. In spelbomen van spelletjes voor twee personen zijn er twee soorten knopen: maxknopen en minknopen. De speler, voor wie de boom opgesteld wordt, speelt in de maxknopen, de tegenspeler in de minknopen. In de maxknopen probeert die speler de toestand van het spel maximaal in zijn voordeel om te buigen, terwijl de tegenspeler het omgekeerde doet en het spel minimaal in het voordeel van de eerste speler ombuigt. In Tantrix is het mogelijk dat er meerdere max- of minknopen elkaar opvolgen. Voor de meeste spelletjes is het doorzoeken van de volledige boom een onmogelijke taak. Dit is ook het geval voor Tantrix . Daarom maken we gebruik van een heuristiek. Een heuristiek geeft voor een gegeven spelsituatie een re¨eel getal terug die een maat is voor hoe gunstig een spelsituatie voor de speler is. Het is echter onmogelijk om met een heuristiek alleen de volgende zet te beslissen. Minimax Als de computer de beste zet wil spelen, moet hij het optimale pad zoeken in de spelboom. Dit pad leidt naar een blad in de boom waar de computer met het grootst mogelijke verschil kan winnen, of met het kleinst mogelijke verschil kan verliezen. De hele boom doorzoeken is niet mogelijk, maar als we op een redelijke diepte stoppen, dan volstaat het op dat punt om de spelsituaties te vergelijken met een heuristiek. Door het minimaxalgoritme te gebruiken op een spelboom en te stoppen op een bepaalde diepte, kunnen we het optimale pad vinden naar de door de heuristiek best ingeschatte spelsituatie. De werking van minimax is gegeven in algoritme 3.1.1. Minimax voor Tantrix Algoritmes 3.1.2 en 3.1.3 tonen hoe minimax gebruikt werd in een eerste implementatie. Om betere resultaten te hebben, werden enkele aanpassingen gedaan. • Er werd verondersteld dat er geen tegels vanuit de zak in het spel komen terwijl er in de spelboom gezocht wordt. Tijdens het zoeken legt het minimaxalgoritme dus een tegel van een speler op het bord, maar de speler krijgt geen nieuwe tegel uit de zak. Dit was nodig om de vertakking van de spelboom laag genoeg te houden. Spelbomen met een al te hoge vertakking zijn lastig om in te zoeken. Dit is een probleem bij bijvoorbeeld go.
Chapter D: Nederlandse samenvatting
99
• Het is belangrijk om dieper te zoeken wanneer interessante situaties opduiken. Verder is het ook belangrijk om de afloop van een bepaalde zet te kennen. Daarom zoekt het algoritme verder wanneer een geforceerde zet gespeeld wordt. De branching is dan ook laag, waardoor verder zoeken goed te doen is. • Er werd vastgesteld dat de beste resultaten bereikt werden na het zoeken op een diepte van drie zetten. Als er geforceerde zetten gespeeld worden, dan wordt die diepte elke keer verhoogd, maar niet verder dan acht zetten diep. Alfabeta Intu¨ıtief weten we dat het niet nodig is om alle mogelijke eindsituaties te overlopen. Zoals figuur 3.2 illustreert, zijn er altijd zetten in de boom die al snel heel slecht blijken te zijn. Verder zijn sommige zetten van de tegenstander zo goed voor een speler dat die tegenstander ze nooit zou kiezen. Wanneer we beginnen te zoeken, kunnen we een alpha en een beta waarde in iedere knoop langs het pad dat we onderzoeken bijhouden. Deze twee waardes geven een ondergrens en bovengrens aan van het uiteindelijke resultaat. Indien we voor een bepaalde tak onmogelijk nog zo’n waarde kunnen vinden (omdat α ≥ β), dan is het niet nodig dat die tak verder onderzocht wordt.
D.4.2
Heuristieken
Zowel minimax als alphabeta steunen op het feit dat de spelsituatie in de bladeren van de boom ge¨evalueerd kan worden. De evaluatie van een spelsituatie gebeurt met een heuristiek. Er werden een aantal heuristieken uitgewerkt. Verder wordt ook beschreven hoe het resultaat van de verschillende heuristieken gecombineerd kan worden in een globaal eindtotaal. Dit eindtotaal is een re¨eel getal dat weergeeft hoe goed de spelsituatie is. Een negatief getal geeft aan dat de spelsituatie nadelig is. Hoe groter het getal, hoe beter de spelsituatie. Verschil in re¨ ele lijnlengte Een triviale heuristiek bestaat erin om het verschil in echte lengte van de langste lijnen te berekenen. Dit is in feite niets anders dan het verschil in tussenstand. Figuur 3.4 geeft een spelsituatie weer waar geel een echte lijn heeft van 11 tegels lang, terwijl blauw een lijn heeft van 7 tegels lang. Geel is hier duidelijk aan het winnen. Een maat voor het voordeel van geel zou het verschil van de tussenstanden kunnen zijn, wat hier 4 is. Het verschil in score kan berekend worden met behulp van algoritme 3.3.1.
Chapter D: Nederlandse samenvatting
100
Virtuele lijnlengte Als enkel het verschil in tussenstand in rekening gebracht wordt, dan is dat zeker onvoldoende. Vooral in het beginspel is het belangrijk om indirecte verbindingen te maken. In het begin zijn gaten in de lijn voordelig, zolang ze opgevuld kunnen worden. Een voorbeeld is gegeven in figuur 3.5. Hoewel ‘blauw’ hier leidt in punten, is de situatie toch voordeliger voor ‘geel’. Algoritme 3.3.2 geeft een manier om dit verschil te berekenen. ‘Fair share’ lijnlengte De eerste twee heuristieken zijn een goede maat voor de huidige toestand van de lijn, maar houden geen rekening met de mogelijkheden in de toekomst. Deze heuristiek probeert een schatting te maken van het aantal tegels die nog zullen kunnen worden toegevoegd aan elk van beide eindpunten van de lijn. Figuur 3.6 toont een voorbeeld van zo’n spelsituatie. In deze figuur is de tussenstand voor geel en groen gelijk, maar toch staat groen er beter voor. Niet alleen is de langste virtuele lijn langer dan de langste virtuele lijn van groen, maar ook de mogelijkheden aan de eindpunten van de langste lijn zijn voor groen uitgebreider. Algoritmes 3.3.3 en 3.3.4 stellen een manier voor om dit uit te rekenen. In sectie 3.3.3 kan men ook de uitwerking van het algoritme voor de spelsituatie op figuur 3.6 vinden. Opvulheuristiek Zoals te zien is op figuur 3.7 is het soms interessant om te weten hoe geblokkeerde situaties kunnen eindigen. Op deze figuur heeft ‘rood’ een echte lijn van 13 tegels lang, maar naargelang de tegels die nog op de oranje stippen passen, zou rood wel eens een grote lus kunnen maken. Algoritme 3.3.5 stelt een manier voor om dit uit te rekenen. Combinatie van de heuristieken In de vorige secties zijn verschillende maten voor de kwaliteit van een spelsituatie voorgesteld. Al deze heuristieken geven een re¨eel getal terug dat weergeeft hoe goed de bordsituatie voldoet aan de specifieke eigenschappen die de heuristieken opmeten. Algoritmes 3.3.6, 3.3.7 en 3.3.8 geven weer hoe deze gecombineerd zouden kunnen worden. Deze combinatie is afgeleid uit de bevindingen van hoofdstuk 2. In het begin van het spel is de virtuele lijnlengte zeer belangrijk. Op het einde echter is de re¨ele lijnlengte het belangrijkste. Op ieder moment in het spel is het wel belangrijk om de eindpunten van de langste lijn vrij te houden. Virtuele lijnlengte en fair share lijnlengte worden daarom gecombineerd in een begingamescore. Re¨ele lijnlengte en fair share lijnlengte worden gecombineerd in een endgamescore. Tussen zet 35 en 44, de laatste 11 zetten voor de zak leeg is, neemt het belang van de begingamescore af en neemt het belang van de endgamescore lineair toe.
Chapter D: Nederlandse samenvatting
D.5
101
Implementatie
De ‘interface’ Sinds 1997 is het mogelijk om Tantrix online te spelen op Tantrix.com. E´en van de belangrijkste doelstellingen was dan ook om een robot te implementeren die op Tantrix.com kon spelen. Het resultaat van de implementatie is ge¨ıntegreerd in de Java applet op Tantrix.com. Deze robot heet Oliver. Om de robot in te kunnen passen in het bestaande framework, diende de interface zoals gegeven in listing 4.1 ge¨ımplementeerd te worden. Het enige wat de robot krijgt is de spelsituatie waarvoor hij een zet moet berekenen. In sectie 4.1.1 wordt uitgelegd hoe de spelsituatie verkregen wordt. De programmeur van Tantrix.com heeft ook een ‘robotkit’ samengesteld, met extra software om op een testserver te spelen. Dit heeft zijn voor- en nadelen. Het ontwerp en implementeren van een userinterface om het bord en de tegels van de spelers te visualiseren zou veel werk vragen. In de robotkit zit een applet om de spelsituatie te tonen op het scherm. De robotkit zorgt er verder ook voor dat elk spel bewaard wordt. Achteraf kan ieder spel ook herbekeken worden met een ‘reviewer’. Om een nog sterkere robot te maken, zijn er toch een aantal tekortkomingen aan deze interface. Het is niet mogelijk om de robot te vragen om bepaalde zetten te evalueren. Verder is het ook niet mogelijk om een spel vanaf een bepaald punt opnieuw te herspelen. Ten slotte is het ook niet mogelijk om meer visuele details over het zoeken in de spelboom te krijgen. Ontwerp van Oliver Figuur 4.3 vat de structuur van Oliver samen. Via de interface komt informatie binnen over de toestand van het spel. Met behulp van deze informatie wordt een nieuwe representatie opgebouwd. De implementatie deelt dus op geen enkele manier componenten met de reeds bestaande robot. Om een spel, spelbord, tegels, spelers en tegelzak voor te stellen, worden eigen klassen met eigen methodes gebruikt. Wanneer de representatie is omgezet in Java objecten, worden deze objecten doorgegeven aan een ander object om zetten te zoeken. Verder zijn er ook nog twee klassen om statistieken van het spel bij te houden. Deze klassen gebruiken een mysql-database om informatie op te slaan over de branching, het aantal ge¨evalueerde spelposities en de gespeelde spelletjes. Deze informatie kan dan achteraf opgevraagd worden om er berekeningen op uit te voeren. Dit is bijvoorbeeld handig om te zien of een toevoeging of verandering resulteert in een betere robot.
D.5.1
Een gedistribueerde implementatie
In een eerste implementatie werd verondersteld dat er geen nieuwe tegels uit de zak in het spel kwamen. Zoals de resultaten aangeven, is deze veronderstelling niet onterecht
Chapter D: Nederlandse samenvatting
102
gemaakt, maar misschien is er verbetering mogelijk als we binnenkomende tegels toch in rekening brengen. Daarom werd een gedistribueerd systeem opgesteld met behulp van Java RMI. Dit systeem bestaat uit een centrale server, die ook de interface van de ‘robotkit’ implementeert. Deze server bouwt de spelsituatie op en genereert zoveel tegelsequenties als er computers zijn. Iedere computer ontvangt daarna de spelsituatie en een sequentie. Op elke computer lopen dan de algoritmes zoals hiervoor beschreven om de beste zet te zoeken in de spelboom. Het verschil is evenwel dat iedere computer rekent voor zijn eigen sequentie van tegels. Vervolgens kent iedere computer punten tussen nul en ´e´en toe aan alle mogelijke zetten. Deze punten worden dan op de centrale server verzameld en de zet met het meeste aantal punten wordt dan gekozen. Deze aanpak wordt hier de ‘Monte Carlo’ aanpak genoemd. Het aantal mogelijke sequenties van tegels is tamelijk groot. De gulden middenweg kan hier genomen worden door bijvoorbeeld in de eerste dertig zetten ´e´en tegel aan iedere computer te geven en pas wanneer er weinig tegels in de zak zitten over te gaan op het uitdelen van een sequentie van twee tegels aan iedere computer. Wanneer er maar vier tegels overblijven, kan de beste zet voor iedere mogelijke opeenvolging van tegels berekend worden, als er meer dan 4! = 24 computers beschikbaar zijn. Deze aanpak verschilt met de opvulheuristiek, omdat ze de effecten van tegels die plots in het spel kunnen komen op korte termijn onderzoekt. De opvulheuristiek bepaalt wat de gevolgen zijn op lange termijn van bepaalde tegels die het spel binnen kunnen komen. Een klassediagram van de gedistribueerde versie is gegeven in figuur 4.4 en wordt besproken in sectie 4.4.4.
D.6 D.6.1
Resultaten Performantie van de implementatie
Om verschillende versies van een robot te kunnen vergelijken met andere robots en andere spelers is een rankingsysteem nodig. Er zijn een aantal manieren om een robot punten te geven. • Winst percentage: er wordt een reeks spelletjes gespeeld tussen twee verschillende robots, waarna gekeken wordt hoeveel procent elk wint. Hoe meer spelletjes een robot wint, hoe beter. • Toernooiscore: in toernooien telt ook het verschil in punten mee en niet alleen het feit of je wint of verliest. • Lobbyranking: iedere speler die online Tantrix speelt krijgt een ranking. Na ieder spel wordt de ranking aangepast. In dit systeem heeft een beginner 500 punten, een gevorderd speler meer dan 800 punten en een master speler meer dan 950 punten.
Chapter D: Nederlandse samenvatting
103
Een eerste implementatie van alfabeta en een heuristiek die enkel de re¨ele lijnlengte en de virtuele lijnlengte in rekening bracht, was al beter dan GoodBot. GoodBot is een verbeterde versie van Robot die Dave Dyer als reactie op dit project ontwikkeld heeft. Een gedistribueerde implementatie van de ‘Monte Carlo’ aanpak die willekeurige sequenties van tegels verspreidt over 48 computers, presteerde niet significant beter, noch slechter dan de eerste implementatie. De uiteindelijke implementatie werkt met alphabeta en combineert de resultaten van de re¨ele lijnlengte, virtuele lijnlengte en fair share lijnlengte heuristieken om de spelposities te evalueren. Deze implementatie is beschikbaar op Tantrix.com en speelt onder de naam Oliver. Op 11 mei 2005 werd een officieel toernooi gehouden tussen GoodBot en Oliver. In een reeks van 200 spelletjes slaagde Oliver erin om 129 spelletjes te winnen. GoodBot won 61 spelletjes en 10 andere gingen gelijk op. Hiermee heeft Oliver de vorige beste robot overtuigend verslaan.
D.6.2
Statistieken van het spel
Vertakking Figuur 5.3 geeft weer hoeveel mogelijke zetten er zijn in functie van het aantal tegels dat al gespeeld is. Deze figuur toont aan dat de gemiddelde vertakking van de spelboom rond de 46 ligt gedurende het middenspel. Voor het berekenen van dit gemiddelde werd enkel de vertakking van vrije zetten in rekening gebracht. De figuur toont ook aan dat de branching sterk kan vari¨eren. Tijdens het eindspel is gemiddelde vertakking wel veel hoger. Wanneer de zak leeg is en de drie extra restricties wegvallen, zijn er meer dan 100 mogelijke zetten. Aantal ge¨ evalueerde knopen Deze figuur heeft een analoog verloop. Tijdens het middenspel worden er gemiddeld 5000 spelposities ge¨evalueerd. Wanneer de extra restricties wegvallen stijgt dit naar 43000 spelsituaties.
D.7
Een blik op de toekomst
Naar de toekomst toe zijn er nog een aantal verbeteringen mogelijk. De implementatie kan sneller gemaakt worden of beter. Een aantal mogelijkheden, zoals bijvoorbeeld het maken van een eigen interface worden voorgesteld in hoofdstuk 6.
Chapter D: Nederlandse samenvatting
D.8
104
Besluit
In deze thesis werd een ontwerp en implementatie gemaakt van een programma voor het spelen van Tantrix . Tantrix is een strategisch bordspel afkomstig van Nieuw-Zeeland en uitgevonden door Mike McManaway. Het bestaat uit 56 zeshoekige tegels met gekleurde lijnen erop. Tantrix is te koop in de winkel en kan ook op Tantrix.com gespeeld worden. Tot nu toe zijn er al verschillende robots geprogrammeerd, maar ze kunnen gemakkelijk verslaan worden door gevorderde spelers. De uiteindelijke implementatie werkt met alphabeta en combineert de resultaten van de re¨ele lijnlengte, virtuele lijnlengte en fair share lijnlengte heuristieken om de spelposities te evalueren. Deze implementatie is beschikbaar op Tantrix.com en speelt onder de naam Oliver. Op 11 mei 2005 werd een offici¨eel toernooi gehouden tussen GoodBot en Oliver. In een reeks van 200 spelletjes slaagde Oliver erin om 129 spelletjes te winnen. GoodBot won 61 spelletjes en 10 andere gingen gelijk op. Deze resultaten wijzen op een duidelijke overwinning voor Oliver. Met dit werk hoop ik een volgende stap gezet te hebben in de richting van een programma dat een uitdaging is voor de beste Tantrix spelers.
Bibliography [1] McManaway M., Game Pack spelregels, Tantrix Games International, New Zealand, 2004 [2] http://www.tantrix.com/english/TantrixTiles.html [3] http://www.tantrix.com/english/TantrixHistory.html [4] http://www.tantrix.com/english/TantrixGameRules.html [5] http://www.tantrix.com/english/TantrixGameHowTo.html [6] http://www.tantrix.com/english/TantrixRobotHow.html [7] http://www.tantrix.com/english/TantrixStrategy.html [8] http://www.andromeda.com/people/ddyer/tantrix/TantrixTips.html [9] http://tournaments.tantrix.co.uk/results/strategy.shtml [10] http://tournaments.tantrix.co.uk/rulessum.shtml [11] Winston P., Artificial Intelligence, Addison-Wesley Publishind Company, 1992 [12] Prof. D. De Schreye, Fundamentals of Artificial Intelligence and Knowledge-Based systems, Cursusdienst VTK, 2003 [13] http://www.cs.berkeley.edu/ milch/cs188/alpha-beta.html [14] Rich E., Knight K., Artificial Intelligence, Uitgeverij McGraw-Hill, 1991 [15] Russel S., Norvig P., Artificial Intelligence: a modern approach, 2nd edition, Prentice Hall, 2003 [16] Luger F. G., Artificial Intelligence, 5th edition, Addison Wesley, 2005 [17] Norvig P., Paradigms of artificial intelligence programming, Morgan Kaufmann Publishers, 1992
105
BIBLIOGRAPHY
106
[18] Mitchell T. M., Machine Learning, McGraw-Hill International Editions, 1997 [19] http://www.tantrix.com/english/TantrixGameLogon.html [20] Larman C., Applying UML and Patterns, second edition, Prentice Hall, 2002 [21] Gamma E., Helm R., Johnson R., Vlisssides J., Design Patterns, Addison-Wesley, 1995 [22] http://www.tantrix.com/english/TantrixBattle.html [23] Coulouris G., Dollimore J., Kindberg T., Distributed systems concepts and design, third edition, Addison-Wesley, 2001 [24] Beirlant J., Van Dyck J., Statistiek en waarschijnlijkheidsleer, Cursusdienst VTK, 2004 [25] http://tournaments.tantrix.co.uk/rules.shtml#Tournament scoring [26] http://www.tantrix.com/english/TantrixRankingsHow.html [27] http://www.tantrix.com/english/TantrixMasterQualification.html [28] http://www.tantrix.com/cgi-bin/gs rankings.cgi?country=cyberspace
List of Figures 1.1
A summary of the game . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.2
The current tile set (from Tantrix.com [2]) . . . . . . . . . . . . . . . . . .
5
1.3
Four basic shapes (from Tantrix.com [7]) . . . . . . . . . . . . . . . . . . .
5
1.4
Four basic shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.5
Colours of touching links have to match (from Tantrix.com [5]) . . . . . . .
7
1.6
The first restriction rule (from Tantrix.com [4]) . . . . . . . . . . . . . . .
7
1.7
The second restriction rule (from Tantrix.com [4]) . . . . . . . . . . . . . .
7
1.8
The third restriction rule (from Tantrix.com [4]) . . . . . . . . . . . . . . .
9
1.9
A controlled side (from Tantrix.com [7]) . . . . . . . . . . . . . . . . . . .
9
1.10 A gameboard at the end of the game . . . . . . . . . . . . . . . . . . . . .
9
2.1
An indirect red line with a length of five tiles (from Tantrix.com [7]) . . . .
13
2.2
A small indirect red loop of four tiles (from Tantrix.com [7]) . . . . . . . .
13
2.3
A small indirect red loop of seven tiles (from Tantrix.com [7]) . . . . . . .
13
2.4
A red loop of 9 tiles (from Tantrix.com [7]) . . . . . . . . . . . . . . . . . .
13
2.5
Two isolated red lines (from Tantrix.com [7]) . . . . . . . . . . . . . . . . .
15
2.6
There are only two tiles left that fit in space A (from Tantrix.com [7]) . . .
15
2.7
The red line is blocked in A (from Tantrix.com [7]) . . . . . . . . . . . . .
15
2.8
The yellow line is blocked by one tile . . . . . . . . . . . . . . . . . . . . .
16
2.9
A pair block (from cheekymbk-17-GoodBot-24-2005-04-21-1709) . . . . . .
16
2.10 A lookalike (from Tantrix.com [7]) . . . . . . . . . . . . . . . . . . . . . . .
17
2.11 Red is permanently blocked in B (from Tantrix.com [7]) . . . . . . . . . . .
17
3.1
The game tree of Tantrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.2
The idea of alphabeta (from Milch [13]) . . . . . . . . . . . . . . . . . . . .
27
107
LIST OF FIGURES
108
3.3
A game tree processed by the alphabeta algorithm (from Milch [13]) . . . .
27
3.4
Real line length yellow vs blue: 11-7 . . . . . . . . . . . . . . . . . . . . . .
33
3.5
Virtual line length yellow vs blue: 8.7 - 6 . . . . . . . . . . . . . . . . . . .
33
3.6
Fair share line length yellow vs green: 11 - 22.2 . . . . . . . . . . . . . . .
38
3.7
The orange dots are interesting for red . . . . . . . . . . . . . . . . . . . .
38
4.1
An example of a game situation (from Dave Dyer) . . . . . . . . . . . . . .
46
4.2
Tile 49 with rotation 0 and rotation 1 . . . . . . . . . . . . . . . . . . . . .
46
4.3
A class diagram of Oliver . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
4.4
A class diagram of the distributed version . . . . . . . . . . . . . . . . . .
53
5.1
Winning percentage evolving as games are played . . . . . . . . . . . . . .
57
5.2
Tournament score evolving as games are played . . . . . . . . . . . . . . .
57
5.3
Minimum, maximum and average branching based on 1200 games . . . . .
65
5.4
Average number of evaluated game positions for a free move, based on more than 5000 games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.5
A key move by Oliver in this game . . . . . . . . . . . . . . . . . . . . . .
67
5.6
This would have been a better move
. . . . . . . . . . . . . . . . . . . . .
67
5.7
Intermediate result at move 18 . . . . . . . . . . . . . . . . . . . . . . . . .
68
5.8
The result of the mistake in move 16 in move 19 . . . . . . . . . . . . . . .
68
5.9
At move 3 Oliver makes a mistake . . . . . . . . . . . . . . . . . . . . . . .
70
5.10 The result of move 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
5.11 This would have been a better move
. . . . . . . . . . . . . . . . . . . . .
71
5.12 At move 22 the mistake of move 3 is exploited . . . . . . . . . . . . . . . .
71