Hide code cell source

# UNCOMMENT FOR INTERACTIVE PLOTTING
# %matplotlib notebook
%matplotlib widget
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import animation, rc, cm
import time
from snakelib import FastSnake, show_gui, NeuralAgent
import snakelib as snklib
import utils
from scipy.ndimage import label

rc("animation", html="html5")

Snake, sensors, and genetic learning#

This notebook is an example of supervised learning applied to video games. You will use the legendary game Snake rewritten in Python for the occasion and will try to develop an automatic game strategy. In a first step, by hand and in a second step using a genetic algorithm to evolve a neural network. Graphical examples will allow to see the evolution of the game performances.

Required files

In order to work properly, this notebook requires the following modules in its folder:

Put it in your working directory along with this notebook.

This practical session is organized in three parts:

  1. Discovering the game and its interface (about 30 min)

  2. Designing a manual agent from sensors (about 1 h 30)

  3. Training an automatic agent with a recurrent neural network and a genetic algorithm

Throughout the practical, the observation space is intentionally restricted to three official sensors:

  • default: tiny historical local sensor

  • label: advanced sensors.

The pedagogical goal is to compare three levels of design:

  • a fully hand-written policy

  • a parametric policy learned automatically

An important lesson of this tutorial is that the most informative representation is not always the easiest one to optimize.

Part 1: Discovering the game and its interface#

In this tutorial, we are going to teach a machine how to play the Snake game. In our implementation, we added a new option max_snake_length to cap the length of the snake. If set to None, the game is the standard version.

The goal of this first part is to understand:

  • the game grid

  • the available commands

  • the relative direction convention

  • the interactive display

np.random.seed(0)  # Fixing the seed
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
snake = FastSnake(Nrow=10, Ncol=10, snake_max_length=3)
display(show_gui(snake, ax))
plt.close()

1. The Grid#

The game grid is composed of a number of columns noted Ncol and a number of lines noted Nrow.

print(
    f"Grid dimensions: {snake.Ncol} columns, {snake.Nrow} rows, a total of {snake.Ncell} elements"
)
Grid dimensions: 10 columns, 10 rows, a total of 100 elements

Each element of the grid is identified by an unique index.

Hide code cell source

utils.plot_grid(snake)

2. Commands#

At each step, the snake does not choose an absolute direction. It chooses a relative turn with respect to its current heading:

Here, an example of commands that allow you to move the snake :

# Moving the snake
np.random.seed(0)  # Fixing the seed
snake.reset()  # Reset snake position
snake.turn(-1)  # Go Right
snake.turn(0)  # Go Front
snake.turn(-1)  # Go Right
snake.turn(0)  # Go Front
snake.turn(1)  # Go Left

# Access to neighbors element positions
print(f"Current neighbors index {snake.get_neighbors_pos()}")
Current neighbors index [44 35 24]

Note

You notice that the value -1 turn the snake to the right, 0 in front and 1 to the left relatively to the current snake direction.

Here are the results of the actions taken by these various successive orders:

Hide code cell source

utils.plot_snake(snake)

Question 1#

Starting from the initial state, which sequence of commands makes the snake eat the first fruit?

Write your answer in the next cell, then check it graphically.

# Initial snake position
np.random.seed(0)  # Fixing the seed
snake.reset()  # Reset snake position
snake.turn(-1)  # Go Right
snake.turn(0)  # Go Front
snake.turn(-1)  # Go Right
snake.turn(0)  # Go Front
snake.turn(1)  # Go Left

# Your answer

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
display(show_gui(snake, ax))

3. Available sensors#

Chris Roberts (video game developer)

It turns out that in their infinite wisdom, the game designers gave the snake some very weird sensors to help the players. Here’s how to display the sensor data:

default sensor#

# Moving the snake
np.random.seed(0)  # Fixing the seed
snake.reset()  # Reset snake position
snake.display_sensor_method = "label"
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
display(show_gui(snake, ax))

The first 3 elements indicate respectively the nature of the boxes directly to the right, in front of and to the left of the snake’s head (relative to its direction).

The values of the elements can be:

  • 1.0 = fruit present

  • -1.0 = forbidden position (lava or snake tail)

  • 0.0 = nothing special present

Hide code cell source

utils.plot_snake(snake)

Example

The head of the snake is in position 11. To its left is lava, a value of -1 is read on the first element of the sensors output. To its right, there is nothing, a value of 0 is read on the third element of the sensors output.

The last two data give the relative directions of the fruit relative to the head of the snake. Indeed, there is an angle \(\theta\) between the direction of the snake and the position of the fruit on the grid.

These last two values are respectively the sign of \(\cos(\theta)\) and \(\sin(\theta)\).

# Moving the snake
np.random.seed(0)  # Fixing the seed
snake.reset()  # Reset snake position
snake.turn(-1)  # Go Right
snake.turn(0)  # Go Front
snake.turn(-1)  # Go Right
snake.turn(0)  # Go Front
snake.turn(1)  # Go Left

# Snake current direction
direction = snake.get_current_direction()
print(f"Current direction: {direction}")
Current direction: 0
The current direction of the snake based on the positions of its head and neck.
direction (int): An integer representing the current direction of the snake, where:
    0 = right
    1 = up
    2 = left
    3 = down
    -1 = if the direction could not be determined

Hide code cell source

utils.plot_snake_theta(snake)

label sensor#

The label sensors have the same shape as the default sensor. The main difference is that they identify the number of free cases that are reachable with a given choice. If the number of reachable cells is the maximum, it returns 1. Otherwise, it returns -1. If there is no reachable cell, it returns -1.

# Moving the snake
np.random.seed(0)  # Fixing the seed
snake.reset()  # Reset snake position
snake.display_sensor_method = "default"
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
display(show_gui(snake, ax))

Part 2 - Designing a manual agent#

In this part, you must build an agent that plays automatically without learning.

Your agent must be a Python function that receives:

  • one sensor vector

and returns:

  • one move in{-1, 0, 1}

  • an optional updated memory state

This is the core part of the practical. It forces you to understand what the sensors really contain and whether memory helps or not.

Question 2#

Write a function manual_agent_step(sensor) that returns a turn.

Suggested workflow:

  • start with sensor_method = default and snake_max_length = 3

  • obtain a simple agent that survives and occasionally eats a fruit

  • increase snake_max_length and finaly use snake_max_length = None

  • try the label sensor and reiterate.

Here an example of a very dummy agent:

def manual_agent_step(sensors):
    """
    A dummy agent that moves in a random direction.
    """
    choice = np.random.randint(3) - 1  # A RANDOM CHOICE, REPLACE WITH YOUR OWN CODE
    return choice


snake = FastSnake(Nrow=10, Ncol=10, snake_max_length=3)
my_choice = manual_agent_step(snake.sensors())

Automatic play with graphic output#

Hide code cell source

snake5 = FastSnake(Nrow=10, Ncol=10)


def updatefig(*args):
    sensors = snake5.sensors(method="label")
    my_choice = manual_agent_step(sensors)
    snake5.turn(my_choice)
    im2.set_array(snake5.grid)
    score = snake5.score
    iteration = snake5.iteration
    title2.set_text(f"Score = {score} Iteration = {iteration}")
    if snake5.status != 0:
        snake5.reset()
    return (im2,)


fig2, ax2 = plt.subplots()
ax2.axis("off")
im2 = plt.imshow(snake5.grid, interpolation="nearest", animated=True)
score = snake5.score
title2 = ax2.set_title(f"Score = {score}", animated=True)
anim = animation.FuncAnimation(fig2, updatefig, frames=40, interval=50, blit=True)
# plt.show()  # UNCOMMENT TO PLAY
plt.close()  # COMMENT TO PLAY
anim  # COMMENT TO PLAY

Hide code cell source

anim.pause()
plt.close(fig2)

Question 3#

Build a small benchmark protocol to evaluate your manual strategy on several random seeds.

The minimum expected output is:

  • a mean score

  • a maximum score

  • a mean number of turns

If you compare several sensors designs, keep the results in a table.

Hide code cell source

Nagent_ids = 200
max_turns = 1000
snake3 = FastSnake(Nrow=10, Ncol=10)
scores = np.zeros(Nagent_ids)
turns = np.zeros(Nagent_ids)
# for agent_id in tqdm.trange(Nagent_ids):
for agent_id in range(Nagent_ids):
    snake3.reset()
    turn = 0
    while snake3.status == 0:
        sensors = snake3.sensors()
        my_choice = manual_agent_step(sensors)
        snake3.turn(my_choice)
        turn += 1
        if turn >= max_turns:
            break
    scores[agent_id] = snake3.score
    turns[agent_id] = turn
# CODE HERE

Part 3 - Automatic learning with a neural network and a genetic algorithm#

In this part, the hand-written decision function is replaced by a neural network.

At each step, the network receives:

  • the current observation

and returns:

  • three action scores corresponding to each possible moves

Learning is performed without gradients, only by evolving a population of parameters in a genetic algorithm.

What is expected in this part#

You are expected to do more than just run the code. Your work should include:

  • a first successful training run with the default sensor

  • at least one comparison involving the sensor, for example default versus label

  • an estimation of the impact of the snake_max_length on the learning performances.

  • plots of score and fitness across generations

  • one visualization of the best learned agent

  • a short discussion of what worked, what did not work, and why

A first remark#

In this tutorial, the main goal is not to prove that richer sensors performance. In fact, under a small CPU budget and with pure genetic optimization, the simplest setup is sometimes the easiest one to train.

So part 3 should be read as an experiment on the trade-off between:

  • information quality

  • model size

  • optimization difficulty

  • computational budget

Question 4#

Complete and run the following genetic workflow.

Required progression#

  1. start with sensor_method = "default"

  2. start with max_snake_length = 2.

  3. obtain a first baseline learning curve

  4. only then move to a richer sensor such as label

Minimum expected output#

  • one baseline run with the default sensor

  • one comparison between default and label

  • a discussion on the effet of the max_snake_length option

  • the score / fitness plot

  • one animation of the best learned agent

  • a short written conclusion

Another remark#

In this tutorial, it is normal if the richer sensors do not outperform the default sensor. This does not mean that they are bad sensors. It means that a richer representation can also create a harder optimization problem for the genetic algorithm.

Questions to discuss#

  • Does a richer sensor always help?

  • What limits the maximum score ?

  • What is the effect of the max_snake_length variable ?

  • With a fixed CPU budget, is it better to use a larger population or more generations?

  • Is it easier to improve the model by changing the sensor, the memory, or the genetic hyperparameters?

  • Why can the simplest sensor be the most effective one in this setup ?

# NEURAL FUNCTIONS
def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))


def ReLu(x):
    return np.where(x > 0.0, x, 0.0)


def identity(x):
    return x


def arg_max(x):
    return int(np.where(x == x.max())[0][0])
# GENETIC ALGORITHM SETUP

Npop = 100  # NUMBER OF INDIVIDUALS IN THE POPULATION
Ngen = 5  # NUMBER OF GENERATIONS OF EVOLUTION
Ntries = 1  # NUMBER OF TRIES PER INDIVIDUAL PER GENERATION
Net_struct = 5, 3  # NETWORK STRUCTURE
keep_ratio = 0.2  # # GENETIC ALGORITHM KEEP RATIO
mutation_ratio = 0.1  # MUTATION RATIO
mutation_sigma = 1.0  # MUTATION GAUSSIAN SIGMA
max_turns = 600  # MAX PLAY TURNS PER TRIAL
neural_functions = [
    identity
]  # NEURAL FUNCTION VECTOR, SELECT RELU, SIGMOID OR IDENTITY
sensor_method = "default"

# PRE-PROCESSING
keep_individuals = int(keep_ratio * Npop)
Nw = 0
for i in range(len(Net_struct) - 1):
    nin = Net_struct[i]
    nout = Net_struct[i + 1]
    Nw += (nin + 1) * nout
# all_weights = np.random.normal(loc=0.0, scale=1.0, size=(Npop, Nw))
all_weights = (np.random.rand(Npop, Nw) - 0.5) * 2.0

agents = []
for agent_id in range(Npop):
    weights = all_weights[agent_id]
    agent = NeuralAgent(
        weights=weights, structure=Net_struct, neural_functions=neural_functions
    )
    agents.append(agent)
agent_functions = [agent.get_caller() for agent in agents]
func = agent_functions[0]
total_generations = 1
generation_store = []
best_score_store = []
# DATA STORAGE
snake4 = FastSnake(
    Nrow=10,
    Ncol=10,
    snake_max_length=3,
    record_turns=False,
    recorded_sensors_method=sensor_method,
)
scores = np.zeros(Npop)
turns = np.zeros(Npop)
tries_scores = np.zeros(Ntries)
tries_turns = np.zeros(Ntries)
new_all_weights = np.zeros_like(all_weights)
turn_ids = np.array([-1.0, 0.0, 1.0])
for generation in range(Ngen):
    print(f"Generation: {total_generations}")
    generation_store.append(total_generations)
    scores[:] = 0.0
    turns[:] = 0.0
    for agent_id in range(Npop):
        # np.random.seed(0)
        tries_scores[:] = 0.0
        tries_turns[:] = 0.0
        agent_func = agent_functions[agent_id]
        for trial in range(Ntries):
            snake4.reset()
            Ncol = snake4.Ncol
            Nrow = snake4.Nrow
            snake4.fruit_position = (Nrow - 2) * Ncol + Ncol - 2
            turn = 0
            while snake4.status == 0:
                sensors = snake4.sensors(method=sensor_method)
                my_choice = turn_ids[arg_max(agent_func(sensors))]
                snake4.turn(my_choice)
                turn += 1
                if turn >= max_turns:
                    break
            tries_scores[trial] = snake4.score
            tries_turns[trial] = turn
        scores[agent_id] = tries_scores.mean()
        turns[agent_id] = tries_turns.mean()
    perf = scores * 100 - turns
    order = np.argsort(perf)[::-1]
    new_all_weights[:] = 0.0
    # SELECTION
    new_all_weights[:keep_individuals] = all_weights[order][:keep_individuals]
    # HYBRIDATION
    keep_range = np.arange(keep_individuals)
    for indiv in range(keep_individuals, Npop):
        parents = np.random.choice(keep_range, 2)
        while parents[1] == parents[0]:
            parents = np.random.choice(keep_range, 2)
        pw = np.random.rand(Nw)
        new_all_weights[indiv] = (
            new_all_weights[parents[0]] * pw + (1.0 - pw) * new_all_weights[parents[1]]
        )

        # MUTATION:
        if np.random.rand() <= mutation_ratio:
            # mutation_loc = np.random.randint(Nw)
            new_all_weights[indiv] *= np.random.normal(
                loc=1.0, scale=mutation_sigma, size=Nw
            )
    total_generations += 1
    all_weights[:] = new_all_weights

    data = pd.DataFrame(
        {"score": scores[order], "turns": turns[order], "perf": perf[order]}
    )  # .sort_values( "perf", ascending=False    )
    print(data.head(2))
    print(f"=> best score = {scores.max()}")
    best_score_store.append(data.iloc[0].score)
print("FINISHED")
plt.figure("Snake at the gym !")
plt.clf()
plt.plot(generation_store, best_score_store, "or-")
plt.xlabel("Generation")
plt.ylabel("Best Score")
plt.grid()
plt.show()
Generation: 1
   score  turns    perf
0   15.0  110.0  1390.0
1   11.0   96.0  1004.0
=> best score = 15.0
Generation: 2
   score  turns     perf
0  108.0  600.0  10200.0
1   82.0  600.0   7600.0
=> best score = 108.0
Generation: 3
   score  turns     perf
0  108.0  600.0  10200.0
1  101.0  600.0   9500.0
=> best score = 108.0
Generation: 4
   score  turns     perf
0  113.0  600.0  10700.0
1  113.0  600.0  10700.0
=> best score = 113.0
Generation: 5
   score  turns     perf
0  118.0  600.0  11200.0
1  114.0  600.0  10800.0
=> best score = 118.0
FINISHED
snake5 = FastSnake(
    Nrow=10,
    Ncol=10,
    snake_max_length=3,
    record_turns=True,
    recorded_sensors_method=sensor_method,
)
# np.random.seed(0)
# weights = all_weights[0]  # BEST AGENT
best_agent_func = agent_functions[0]
turn_ids = np.array([-1.0, 0.0, 1.0])
np.random.seed(0)


def updatefig(*args):
    if snake.status == 0:
        sensors = snake5.sensors(method=sensor_method)
        my_choice = turn_ids[arg_max(best_agent_func(sensors))]

        snake5.turn(my_choice)
        im2.set_array(snake5.grid)
        score = snake5.score
        iteration = snake5.iteration
        title2.set_text(f"Score = {score} Iteration = {iteration}")

    else:
        title2.set_text(f"Youd died with score = {score}")
    return im2


score = 0
fig2, ax2 = plt.subplots()
ax2.axis("off")
title2 = ax2.set_title(f"Score = {score}", animated=True)
im2 = plt.imshow(snake5.grid, interpolation="nearest", animated=True)
anim = animation.FuncAnimation(fig2, updatefig, frames=1000, interval=30, blit=True)
# plt.show()
plt.close()
anim
turns = snake5.recorded_turns
sensors = np.array(snake5.recorded_sensors)

out = {
    "turn": turns,
}
for i, s in enumerate(sensors.T):
    out[f"s{i}"] = s

data = pd.DataFrame(out)
data.iloc[-10:]
turn s0 s1 s2 s3 s4
991 1.0 0.0 -1.0 1.0 0.0 1.0
992 1.0 -1.0 0.0 0.0 -1.0 1.0
993 0.0 0.0 0.0 0.0 1.0 1.0
994 0.0 0.0 0.0 0.0 1.0 1.0
995 0.0 0.0 0.0 0.0 1.0 1.0
996 0.0 0.0 0.0 0.0 1.0 1.0
997 0.0 0.0 0.0 0.0 1.0 1.0
998 1.0 0.0 0.0 0.0 0.0 1.0
999 0.0 0.0 0.0 0.0 1.0 0.0
1000 0.0 0.0 0.0 0.0 1.0 0.0

Summary#

At the end of the practical, you should be able to discuss:

  • the choice of sensor (default, label)

  • the difference between a hand-written strategy and a learned one

  • why a smaller representation can be easier to optimize than a richer one

  • the effect of population size and mutation

  • the limits of pure genetic learning on Snake