Maze Scenario
The Maze scenario is a classic problem in path finding and reinforcement learning. The agent is placed in a two-dimensional maze and must find its way to an exit. Once the exit is found, the agent is repositioned randomly within the maze and must find the exit again. This cycle continues until the episode ends.
This scenario tests the agent’s ability to learn spatial navigation, path finding, and memory. As the agent explores the maze, it can build up a mental model of the environment and optimize its path to find the most efficient route to the exit.
In some cases, the maze construction will introduce biases that can be learned and exploited.
Goal
Write an AI that can navigate through a maze environment, find the exit, and optimize paths to maximize rewards within the limited number of available moves.
Algorithms and Hints
Consider implementing algorithms such as:
- Random exploration for initial maze discovery
- Depth-first search or breadth-first search for systematic exploration
- A* or Dijkstra’s algorithm for finding optimal paths once the maze layout is known
- Memory systems to track visited locations and build a map of the maze
Setup
At the start of each episode, the agent is placed in a maze with a defined size and structure. The agent must navigate through the maze to find the exit. When the exit is found, the agent is repositioned randomly within the maze.
The following options are available when setting up the scenario:
| Key | Value |
|---|---|
| mazeSize | The size of the maze (SMALL, MEDIUM, LARGE) |
| mazeType | The style/generation algorithm of the maze (BINARYTREE, SIDEWINDER, RECURSIVEBACKTRACKER, BRAIDED) |
Protocol
The protocol is detailed in Maze.proto and follows a standard State -> Action -> Result pattern.
Maze State
| Component | Data Type | Description |
|---|---|---|
| sessionID | string | Unique identifier for this simulation run |
| episodeID | string | Unique identifier for the current episode |
| startX | int32 | The horizontal (x) coordinate of the agent |
| startY | int32 | The vertical (y) coordinate of the agent |
| movesLeft | int32 | Number of moves remaining before the episode ends |
| width | int32 | The width of the maze (number of cells horizontally) |
| height | int32 | The height of the maze (number of cells vertically) |
Maze Action
| Component | Data Type | Description |
|---|---|---|
| direction | Direction | The direction in which the agent chooses to move (NORTH, SOUTH, EAST, WEST) |
Maze Result
| Component | Data Type | Description |
|---|---|---|
| startX | int32 | The x position before movement |
| startY | int32 | The y position before movement |
| direction | Direction | The direction the agent moved |
| endX | int32 | The x position after movement |
| endY | int32 | The y position after movement |
| stepScore | double | The score received for this specific move |
| accumulatedScore | double | The total score accumulated so far in this episode |