Prove or Disprove Assumption?

Question

I am writing a Q learning algorithm that will learn Tic Tac Toe by plyaing the game (against itself or people).

If you are familiar with Q Learning, you will know that the fewer state action pairs you have, the quicker it can learn.  So, I've written an algorithm to convert a single state into all of its logical equivalents and after many runs with hand made and random board configurations, it appears to be creating valid logically equivalent states, but I can't seem to prove to my self or disprove that this works for all states.

Here is the algorithm:
1.  If it is O's turn, swap all O's for X's and visa versa.  That way we can always assume it's X's turn. (reduces states by 1/2)
2.  Rotate the board to the right 3 times
3.  Flip board horizontally 
4.  Rotate horizontally flipped board 3 times to right.
5.  Flip  board vertically
6.  Rotate vertically flipped board to right 3 times..

The algorithm is based on the following assumptions, if any are false then I need to change the algorithm:
1.  Swapping X's for O'x and visa versa to control who's turn it is still creates the same game.
2.  Rotating the board is a valid operation. 
3.  Flipping horizontally or vertically are both valid.
4. If flipping and rotating are both valid, rotating a flipped version of the state is still valid.  

Here is one example of the algorithm running with a random board where it starts out as O's turn:
In this case it yields 8 logically equivilent states... (*this particular configuration isn't possible in tic tac toe, but all i'm worried about right now is the conversion to equivalent states). 

o o   
o o x 
o   x 
Turn: O

  o o 
x x   
x x x 
Turn: X

  x x 
o x x 
o   x 
Turn: X

o   x 
o x x 
  x x 
Turn: X

o o   
  x x 
x x x 
Turn: X

x   o 
x x o 
x x   
Turn: X

x x   
x x o 
x   o 
Turn: X

x x x 
  x x 
o o   
Turn: X

x x x 
x x   
  o o 
Turn: X

PurpleBlob · Accepted Answer

You can't prove or disprove your algorithm until you have defined what 'logically equivalent' means. From the point of view of playing Tic Tac Toe, where a win means 'three x's or 3 o's in a line', you want to demonstrate that the algorithm doesn't mess up the cells that make up the lines. For instance, if we number the Tic Tac Toe cells as follows:

1 2 3
4 5 6
7 8 9

Then we can define winning lines as {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {1, 4, 7}, {2, 5, 8}, {3, 6, 9}, {1, 5, 9}, {3, 5, 7}}. If a player has filled all the cells in at least one of these combinations, he/she has won. If not, he/she hasn't. 

Now, a rotation or a reflection of this square does not change the definition of winning lines. For instance, this still has the same winning lines:

3 6 9
2 5 8
1 4 7

(Winning lines are: {{3, 6, 9}, {2, 5, 8} ... and so on}
Swapping the pieces round also does not change the definition of a winning line. If, however, you exchange some of the cells, you create a different combination of winning lines:

1 2 4
3 5 6
7 8 9

This combination has a winning line of {4, 6, 9}, which was not present on the original board!! Therefore, if your algorithm produces all possible rotations and reflections of the 3x3 Tic Tac Toe board, it will give you all possible 'equivalent boards' in that a winning line on the original board must be a winning line on the transformed boards. 

Incidentally, a square has 8 symmetries, so your tranformations should give 8 equivalences.

Hope that helps

? · Answer

um...... wat?

Trending News

Prove or Disprove Assumption?

2 Answers