Tron - Week 6 beginner project

    Tron - Week 6 beginner project

    Welcome to week 6!

    By AI Club on 10/30/2024
    0

    Week 6: Implementing Reinforcement Learning in Tron

    Introduction

    Last week, we built a foundation for AI integration in our Tron game using a simple random-movement AI. This week, we're taking a significant leap forward by implementing reinforcement learning (RL). We'll modify our game to support training an AI agent that learns to survive longer in the game environment.

    Implementation Overview

    We'll be updating several files and creating new ones. Here's what we'll be doing:

    1. Modifying the Player class to work with our RL agent

    2. Creating a new training script

    3. Updating our game runner

    4. Implementing the RL agent itself

    Let's go through each file:

    Updated player.py

    player.py

    New run_game.py

    run_game.py

    Training script, train.py

    train.py

    Creating the RL Agent

    Download the following file:

    rl_ai.py

    Understanding the code

    Let's break down our DQN implementation piece by piece to understand how it works.

    Overview

    The DQN implementation consists of two main classes: DQN for the neural network architecture and RLAgent for the reinforcement learning logic. Let's understand how each class works and how they interact.

    DQN Class

    The DQN class defines our neural network architecture that will learn to predict Q-values for each possible action in a given state.

    Structure

    • Inherits from PyTorch's nn.Module

    • Uses three fully connected layers

    • Takes flattened game state as input

    • Outputs Q-values for each possible direction

    Methods

    • __init__: Set's up the neural network layers

    • forward: Defines how input data flows through the network

    RLAgent Class

    The RLAgent class handles all reinforcement learning logic, including state processing, action selection, and training.

    Methods

    • get_state: Converts the game board into a numerical representation

      • Creates a 7x7 window around the player

      • Represents different elements (empty space, walls, trails)

      • Returns flattened state array

    • get_valid_direction: determines legal moves

      • Prevents 180-degree turns

      • Returns a list of valid direction vectors

    • get_direction: Chooses the next action

      • Implements epsilon-greedy strategy

      • Handles exploration vs exploitation

      • Ensures only valid moves are selected

    • remember: Stores experiences in replay buffer

      • Saves state, action, reward, next state, done flag

    • replay: Handles the training process

      • Samples a random batch of experiences

      • Calculates target Q-values

      • Updates the neural network

      • Decays the exploration rate

    • train: Manages a single training episode

      • Tracks episode progress

      • Handles action selection and execution

      • Manages rewards and experience storage

      • Triggers periodic training

      • Returns episode metrics

    • save_model: Saves trained weights

    • load_model: Loads pre-trained weights

    What this means

    This class will act as a template for future improvements. When you run it you will notice it preforms rather poorly. Next week we will continue improving our tron bot.

    Thanks for reading :)

    Comments