Welcome to week 6!
Last week, we built a foundation for AI integration in our Tron game using a simple random-movement AI. This week, we're taking a significant leap forward by implementing reinforcement learning (RL). We'll modify our game to support training an AI agent that learns to survive longer in the game environment.
We'll be updating several files and creating new ones. Here's what we'll be doing:
Modifying the Player class to work with our RL agent
Creating a new training script
Updating our game runner
Implementing the RL agent itself
Let's go through each file:
Download the following file:
rl_ai.pyLet's break down our DQN implementation piece by piece to understand how it works.
Overview
The DQN implementation consists of two main classes: DQN for the neural network architecture and RLAgent for the reinforcement learning logic. Let's understand how each class works and how they interact.
The DQN class defines our neural network architecture that will learn to predict Q-values for each possible action in a given state.
Structure
Inherits from PyTorch's nn.Module
Uses three fully connected layers
Takes flattened game state as input
Outputs Q-values for each possible direction
Methods
__init__: Set's up the neural network layers
forward: Defines how input data flows through the network
The RLAgent class handles all reinforcement learning logic, including state processing, action selection, and training.
Methods
get_state: Converts the game board into a numerical representation
Creates a 7x7 window around the player
Represents different elements (empty space, walls, trails)
Returns flattened state array
get_valid_direction: determines legal moves
Prevents 180-degree turns
Returns a list of valid direction vectors
get_direction: Chooses the next action
Implements epsilon-greedy strategy
Handles exploration vs exploitation
Ensures only valid moves are selected
remember: Stores experiences in replay buffer
Saves state, action, reward, next state, done flag
replay: Handles the training process
Samples a random batch of experiences
Calculates target Q-values
Updates the neural network
Decays the exploration rate
train: Manages a single training episode
Tracks episode progress
Handles action selection and execution
Manages rewards and experience storage
Triggers periodic training
Returns episode metrics
save_model: Saves trained weights
load_model: Loads pre-trained weights
This class will act as a template for future improvements. When you run it you will notice it preforms rather poorly. Next week we will continue improving our tron bot.
Thanks for reading :)