Maze Scenario

The Maze scenario is a classic problem in path finding and reinforcement learning. The agent is placed in a two-dimensional maze and must find its way to an exit. Once the exit is found, the agent is repositioned randomly within the maze and must find the exit again. This cycle continues until the episode ends.

This scenario tests the agent’s ability to learn spatial navigation, path finding, and memory. As the agent explores the maze, it can build up a mental model of the environment and optimize its path to find the most efficient route to the exit.

In some cases, the maze construction will introduce biases that can be learned and exploited.

Goal

Write an AI that can navigate through a maze environment, find the exit, and optimize paths to maximize rewards within the limited number of available moves.

Algorithms and Hints

Consider implementing algorithms such as:

Setup

At the start of each episode, the agent is placed in a maze with a defined size and structure. The agent must navigate through the maze to find the exit. When the exit is found, the agent is repositioned randomly within the maze.

The following options are available when setting up the scenario:

Key Value
mazeSize The size of the maze (SMALL, MEDIUM, LARGE)
mazeType The style/generation algorithm of the maze (BINARYTREE, SIDEWINDER, RECURSIVEBACKTRACKER, BRAIDED)

Protocol

The protocol is detailed in Maze.proto and follows a standard State -> Action -> Result pattern.

Maze State

Component Data Type Description
sessionID string Unique identifier for this simulation run
episodeID string Unique identifier for the current episode
startX int32 The horizontal (x) coordinate of the agent
startY int32 The vertical (y) coordinate of the agent
movesLeft int32 Number of moves remaining before the episode ends
width int32 The width of the maze (number of cells horizontally)
height int32 The height of the maze (number of cells vertically)

Maze Action

Component Data Type Description
direction Direction The direction in which the agent chooses to move (NORTH, SOUTH, EAST, WEST)

Maze Result

Component Data Type Description
startX int32 The x position before movement
startY int32 The y position before movement
direction Direction The direction the agent moved
endX int32 The x position after movement
endY int32 The y position after movement
stepScore double The score received for this specific move
accumulatedScore double The total score accumulated so far in this episode