Week 6: Implementing Reinforcement Learning in Tron

Introduction

Last week, we built a foundation for AI integration in our Tron game using a simple random-movement AI. This week, we're taking a significant leap forward by implementing reinforcement learning (RL). We'll modify our game to support training an AI agent that learns to survive longer in the game environment.

Implementation Overview

We'll be updating several files and creating new ones. Here's what we'll be doing:

Modifying the Player class to work with our RL agent
Creating a new training script
Updating our game runner
Implementing the RL agent itself

Let's go through each file:

Creating the RL Agent

Download the following file:

rl_ai.py

Understanding the code

Let's break down our DQN implementation piece by piece to understand how it works.

Overview

The DQN implementation consists of two main classes: DQN for the neural network architecture and RLAgent for the reinforcement learning logic. Let's understand how each class works and how they interact.

DQN Class

The DQN class defines our neural network architecture that will learn to predict Q-values for each possible action in a given state.

Structure

Inherits from PyTorch's nn.Module
Uses three fully connected layers
Takes flattened game state as input
Outputs Q-values for each possible direction

Methods

__init__: Set's up the neural network layers
forward: Defines how input data flows through the network

RLAgent Class

The RLAgent class handles all reinforcement learning logic, including state processing, action selection, and training.

Methods

get_state: Converts the game board into a numerical representation
- Creates a 7x7 window around the player
- Represents different elements (empty space, walls, trails)
- Returns flattened state array
get_valid_direction: determines legal moves
- Prevents 180-degree turns
- Returns a list of valid direction vectors
get_direction: Chooses the next action
- Implements epsilon-greedy strategy
- Handles exploration vs exploitation
- Ensures only valid moves are selected
remember: Stores experiences in replay buffer
- Saves state, action, reward, next state, done flag
replay: Handles the training process
- Samples a random batch of experiences
- Calculates target Q-values
- Updates the neural network
- Decays the exploration rate
train: Manages a single training episode
- Tracks episode progress
- Handles action selection and execution
- Manages rewards and experience storage
- Triggers periodic training
- Returns episode metrics
save_model: Saves trained weights
load_model: Loads pre-trained weights

What this means

This class will act as a template for future improvements. When you run it you will notice it preforms rather poorly. Next week we will continue improving our tron bot.

Thanks for reading :)

Tron - Week 6 beginner project

Week 6: Implementing Reinforcement Learning in Tron

Introduction

Implementation Overview

Updated player.py

New run_game.py

Training script, train.py

Creating the RL Agent

Understanding the code

DQN Class

RLAgent Class

What this means

Comments