How To Beat Your Dad At Camel Up (Part 5 – Reinforcement Learning State Space Optimization)

In How To Beat Your Dad At Camel Up (Part 4 – Reinforcement Learning) we looked at training a reinforcement learning agent to learn to play Camel Up. There was a problem. Our agent was getting too bogged down in the details! Today, we’re going to fix this and help the agent learn 120 times faster!

I mentioned in the previous article that the agent’s state space was too large, which would mean that the agent would not learn well. This is because the agent matches the exact game state it’s observing to a row in its table and looks up the best action to take. If the number of rows in the table is too large, then it will have trouble populating them meaningfully.

The underlying problem is that our representation of the game state has too many possible values. The information we’re tracking is too specific. We’re passing a list of the spaces each camel is on. For instance, red is on space 2, blue is on space 3, and so on. This meant that the number of options had an upper bound of $16^{5}$ , which is huge!

I realized that the best action given a board state is fundamentally the same whether blue is leading or red is leading. If a camel is way out in front, I’m going to bet on it. It doesn’t matter the color! Using this idea, we can alter the representation of the state space to remove color, which will make the agent’s job a lot easier.

Let’s compare the old and the new approaches:

The old representation

We used a system with an array specifying the space on the track for each camel, where the first element is always the red camel’s position, for example. There’s also been a secondary array telling us the ordering of the stacks of the camels on the board. We also have arrays telling us which dice have been rolled, and which betting tiles are available.

An example is:

Camel Positions: [2, 5, 1, 5, 6],
Stacking: [-1, 3, -1, -1, -1]
- -1 here denotes that nothing is stacked on top of it.
- The 3 tells us which position is stacked on top of it (zero-indexed).
Number of Betting Tiles Available: [2, 4, 4, 1, 0],
Dice to Roll: [0, 0, 0, 0, 1]

The new system

We use an array where the first element is the board space of the camel in first place, the second the board space of the camel in second place, and so forth. This will intrinsically also tell us information about the stacks on the board. For example, if the array is [3, 3, 2, 2, 1] then we know that there are two stacks, one on space 3 containing 2 camels, and one on space 2 containing 2 camels.

We then need another array telling us which camel is in which position to fully know the board state. For decision-making purposes, it doesn’t matter whether the first camel is red, or blue. If you just want to bet on the camel currently leading, it doesn’t matter what color is it!

An example is:

Camel Positions: [10, 10, 9, 8, 6],
Number of Betting Tiles Available: [1, 4, 4, 4, 4],
Dice to Roll: [1, 1, 0, 1, 1]
Order of Camels: [“red”, “blue”, “yellow”, “green”, “purple”]

The agent doesn’t need the “Order of Camels” array to make its action choice. Therefore, the state space should then be smaller by a factor of 5! = 120 which removes the information about which camel is which.

How many states in this representation?

So how many different board states are there in this representation? Let’s have a look. It’s always good to do something two ways and check you get the same answer! We’re going to do a brute-force method and a combinatorics approach. We should get the same answer each way.

Brute Force!

We’re going to try to count all the possible states and sum them up. The argument goes:

The first camel can be on any of the 16 spaces before the finish line.
The second place camel can be on any space up to and including the first place camel.
The third place camel can be on any space up to and including the second place camel, and so forth.

Converting this to summations, the number of combinations is:

$N = \sum_{i=1}^{16} \sum_{j=1}^{i} \sum_{k=1}^{j} \sum_{m=1}^{k} \sum_{n=1}^{m} 1$

We can evaluate this sum using Python. Remember that the range(1, 17) operator will evaluate 1 through 16, it does not evaluate the upper limit:

sum = 0
for i in range(1,17):
    for j in range(1, i + 1):
        for k in range(1, j+1):
            for m in range(1, k+1):
                for n in range(1, m+1):
                    sum += 1
print(sum)

This gives us the answer of 15,504 combinations.

Let’s derive this analytically as well. I’ll sketch the proof and give all the other results you need to complete it. The algebra will be left as an exercise for the reader (I’ve always wanted to say that).

$\begin{aligned} \sum_{i=1}^{16} \sum_{j=1}^{i} \sum_{k=1}^{j} \sum_{m=1}^{k} \sum_{n=1}^{m} 1 &= \sum_{i=1}^{16} \sum_{j=1}^{i} \sum_{k=1}^{j} \sum_{m=1}^{k} m\\ &= \sum_{i=1}^{16} \sum_{j=1}^{i} \sum_{k=1}^{j} \frac{k(k+1)}{2} \end{aligned}$

You then expand the expression and get a sum of squares term and a sum of integers term. To complete the simplification you’ll need the following series expressions:

$\begin{aligned} \sum_{i=1}^{n} i&= \frac{n(n+1)}{2} \\ \sum_{i=1}^{n} i^{2} &= \frac{n(n+1)(2n+1)}{6} \\ \sum_{i=1}^{n} i^{3} &= \frac{n^{2}(n+1)^{2}}{4} \\ \sum_{i=1}^{n} i^{4} &= \frac{n(n+1)(2n+1)(3n^{2} + 3n - 1)}{30} \end{aligned}$

Using $p = 16$ the final expression is

$N = \frac{p(p+1)(p+2)(p+3)(p+4)}{120}$

If you substitute in $p=16$ then this formula agrees with the solution of 15,504 combinations.

Combinatorics!

If we massage the formula above this will turn into something interesting. Firstly note that $120 = 5!$ , and with 5 camels on the board, this is the combination formula for $p-1 + 5 \choose 5$ $= \frac{(p-1+5)!}{5!(p-1)!}$ .

How does this make sense? Well, a board with $p$ spaces can be shown using $p-1$ borders, which I’ll denote by |. I’ll denote camels with an x. For example, if I have two spaces, then a camel can be on the first space as x|, or on the second space as |x. So one | demarcates two spaces on the board. Similarly, 5 spaces can be shown using 4 |, and 16 spaces with 15 |. This means that $p$ spaces on the board can be shown with $p-1$ copies of |.

We also have 5 camels to place on the board, and they can be on a space or in a stack with other camels. The board state can be shown using a string of |s and xs. For example, |||xx||xxx||||||||||| shows a board state with two camels stacked on space 4, and three camels stacked on space 6. The number of possible states becomes the number of different ways we can order this collection of symbols, which is $p+4 \choose 5$ . If we substitute in $p=16$ then this also gives us $N=15,504$ .

So, both of our arguments give us the answer 15,504 for the number of possible camel position combinations. Not all of these will be achievable, for example it’s not possible in the game mechanics to have 4 camels on space 16 and 1 camel on space 1. This is an upper bound for the possible states.

What’s Next?

In the next article, we’ll apply this to our reinforcement learning model and see how it improves! We should see a marked improvement in our performance. We’ll also apply some other optimizations to get our AI agent beating us in no time!

Justin (Jay) Smallwood

How To Beat Your Dad At Camel Up (Part 5 – Reinforcement Learning State Space Optimization)

Comments

Leave a Reply Cancel reply