Legacy code - version 1.2.1

The original AI Sandbox featured a REST based protocol and a “client” application. The sandbox app called out to a REST server API (which you developed), this made it easy for developers who were used to server-side programming and Swagger specifications, to start their AI journey.

Although development has moved to a new code base, the original downloads are still available.

The following documentation is taken from the original website:

Getting Started

Step 1 - download and run the client.

The easiest way to run the client is to download a platform-specific installer for Windows, Linux or Mac. Installing this way will add the application to your start menu and / or create a desktop launcher.

Alternatively, if you have a recent (version 11 or later) Java Runtime Environment installed, you can download the cross-platform JAR and launch the application with the command:

java -jar AISandbox-Client-version.jar

where version matches the version number you downloaded.

Note: due to the architecture restrictions of JavaFX, the JAR file will only run on Windows, Linux and Mac.

Step 2 - setup the simulation

After choosing a simulation to run, you will be shown a screen similar to this:

Main UI

On the left hand side are options to tailor the simulation. This will be different for each scenario. The top right is where you configure your agents. Each agent is defined as a URL which the client will connect to.

Add an agent by pressing the add agent button then configure it by double-clicking on the URL.

Edit the agent details

On the edit agent screen, you can change the agent endpoint as well as add authentication and choose between REST-JSON and REST-XML.

Returning to the main screen, the bottom right of the window will allow you to set up any output options. This includes the ability to take screenshots for any frame or export the output of the simulation as a video.

Step 3 Run Simulation

Clicking on ‘next’ will take you to the run screen

Pressing start will initialise the simulation, the output of which is shown in the main viewscreen.

The graphs at the bottom will summarise how your agent is running.

Scenarios

Multi-Armed Bandits

The Multi-Armed Bandit is one of the classic problems studied in probability theory and AI and it’s one of the first problems you are likely to look at when doing a Reinforcement Learning class. The basic premise is that you have a row of one-armed bandit machines, each tuned to pay out a reward based on a different random variable. When playing you have to decide between choosing to play with the bandit that you think will give you the highest reward, and playing with another bandit which will allow you to build up a more accurate picture of the rewards available.

This trade-off between exploration and exploitation can be seen in many real-world scenarios and research on advanced versions of the Multi-Armed Bandit problem is still ongoing nearly seventy years after the problem was first explored.

For more background information see the Wikipedia Article.

Goal

Write an AI that learns then selects the bandit which returns the highest reward.

Algorithms and Hints

Setup

At the start of each round, each bandit is assigned a different random variable with a normal distribution N(μ,σ²).

The following options are available when setting up the scenario:

Parameter Description
# Bandits The number of bandits available, numbered 0…n-1.
# Pulls The number of arm-pulls in each round.
Bandit Mean How the mean of each bandit is chosen
Bandit Standard Deviation How the standard deviation of each bandit is chosen
Update Rule How the mean of each bandit is updated after each step
Random Salt A repeatable salt for the random number generator – use ‘0’ to create a new value each time
Skip intermediate frames If checked the screen (and output) will only be updated at the end of each round, rather than after every pull. This dramatically reduces the amount of time spend updating the UI.

API Interface

The Swagger specification for this API can be downloaded here.

In each step the Sandbox Client will send the following to the AI server:

Note: If the number of pulls in a round is 100, then the pull parameter will range from 0 to 99.

JSON Example Request

{
  "history": {
    "sessionID": "DEF-5435",
    "chosenBandit": 4,
    "reward": 1.42
  },
  "sessionID": "ABC-23542342",
  "banditCount": 10,
  "pullCount": 10000,
  "pull": 0
}

JSON Example Response

{
  "arm": 3
}

XML Example Request

<?xml version="1.0" encoding="UTF-8"?>
<BanditRequest>
  <history>
    <sessionID>DEF-5435</sessionID>
    <chosenBandit>4</chosenBandit>
    <reward>1.42</reward>
  </history>
  <sessionID>ABC-23542342</sessionID>
  <banditCount>10</banditCount>
  <pullCount>10000</pullCount>
  <pull>0</pull>
</BanditRequest>

XML Example Response

<?xml version="1.0" encoding="UTF-8"?>
<BanditResponse>
  <arm>3</arm>
</BanditResponse>

Graphical Output

The graphical output shows three charts:

Average reward – the reward from each step of the round, averaged over all rounds.

Optimal action – the average percentage that the AI picks the best bandit (the bandit with the highest mean value) at each step in the round.

The current mean (centre of each diamond) and standard deviation (top and bottom of each diamond) of each of the bandits.

Statistics Output

The statistics file will show the average reward and the amount of times each step has chosen the best answer, averaged over all rounds.

Step Ave Reward % Optimal Action
0 0.11180486767982283 8.0
1 -0.16001485097160476 6.000000000000003
2 0.01309801032724169 6.000000000000001

Mazes

The Maze scenario places the AI agent inside a randomly generated maze. The agent can move about the maze, bumping into walls and exploring the paths, until it finds the finish square. Once at the finish square, the agent is returned to the beginning to repeat the process. The goal is to explore the maze and find the shortest path from start to finish. Unlike in a physical maze the start and finish are not always placed at the edges.

Mazes can be generated in several different styles and sizes.

Binary Tree

Binary Tree

A binary tree maze is the simplest of the mazes implemented but contains a heavy diagonal bias.

Sidewinder

Sidewinder

While still biased (the top row will always be a single run of open squares) this looks more like a typical maze than the Binary Tree.

Recursive Backtracker

Recursive Backtracker

This is the maze that most people will recognise, it has many long twisting passages which makes for long solutions.

Braided

Braded

The braided maze is similar to the recursive backtracker, but with all dead-ends removed. Because of this the maze may feature loops and open areas. This can create challenges for some AIs as there may be multiple routes to the same point.

Algorithms and Hints

Further Challenges

API Interface

The Swagger specification for this API can be downloaded ( here)[https://files.aisandbox.dev/swagger/maze.yaml].

The API is implemented as a single POST action which contains:

The result is always a move command followed by one of the valid directions.

Notes:

  1. Positions are represented in X, Y coordinates with (0,0) at the top left.
  2. Agents are automatically reset to the start position after finding the exit. Because of this the

JSON Example Request

{
  "config": {
    "boardID": "2e2e83e5-9ca4-48f3-a6b4-ba1e0061fdfa",
    "validMoves": [
      [
        "North",
        "South",
        "East",
        "West"
      ]
    ],
    "width": 10,
    "height": 20
  },
  "history": {
    "lastPosition": {
      "x": 3,
      "y": 4
    },
    "action": "East",
    "reward": -10.5,
    "newPosition": {
      "x": 3,
      "y": 4
    }
  },
  "currentPosition": {
    "x": 3,
    "y": 4
  }
}

JSON Example Response

{
  "move": "North"
}

XML Example Request

<?xml version="1.0" encoding="UTF-8"?>
<MazeRequest>
  <config>
    <boardID>2e2e83e5-9ca4-48f3-a6b4-ba1e0061fdfa</boardID>
    <validMoves>
      <0>North
    </0>
  </validMoves>
  <width>10</width>
  <height>20</height>
</config>
<history>
<lastPosition>
  <x>3</x>
  <y>4</y>
</lastPosition>
<action>East</action>
<reward>-10.5</reward>
<newPosition>
  <x>3</x>
  <y>4</y>
</newPosition>
</history>
<currentPosition>
<x>3</x>
<y>4</y>
</currentPosition>
  </MazeRequest>

XML Example Response

<?xml version="1.0" encoding="UTF-8"?>
<MazeResponse>
  <move>North</move>
</MazeResponse>

Mine Hunter

Mine Hunter pits the AI against a minefield! A known quantity of mines has been distributed across a grid of squares and the AI agent must work out where they are. To help, each uncovered square will show how many mines are in the surrounding squares.

The grid is represented as an array of characters:

Character Description
F A flagged mine
f An incorrectly flagged empty space (game over)
# A covered square
X An exploded mine (game over)
. An uncovered square with no neighbouring mines
1-8 An uncovered square with n neighbouring mines

and organised in a two-dimensional array (with 0,0 at the top left).

Example Board

Sample Board

JSON Representation

"board": [
"#1",
"#1",
"11"
]

XML Representation


<board>#1</board>
<board>#1</board>
<board>11</board>

Note: in this example the correct response would be to place a flag at 0,1.

Goal

Write an AI agent that can solve as many boards as possible, this can be measured by a moving average which is shown on the output.

Algorithms and Hints

There are quite a few algorithms that can be used to solve this scenario:

API Interface

The Swagger specification for this API can be downloaded here.

In each step, the Sandbox Client will send the following information to the AI server:

The AI agent should respond with one or more moves, each move consists of:

Note: there is no limit to the number of moves that can be included in each response. However, if a move results in the game ending, subsequent guesses will be ignored rather than carried over to the next game.

JSON Example Request

{
  "lastMove": {
    "boardID": "1233-5678-90abc-2435",
    "result": "LOST"
  },
  "boardID": "1234-1234-1234-1234",
  "board": [
    "##1..",
    "##1..",
    "221..",
    "#1..."
  ],
  "flagsRemaining": 3
}

JSON Example Response

{
  "moves": [
    {
      "x": 1,
      "y": 1,
      "flag": true
    }
  ]
}

XML Example Request

<?xml version="1.0" encoding="UTF-8"?>
<MineRequest>
  <lastMove>
    <boardID>1233-5678-90abc</boardID>
    <result>LOST</result>
  </lastMove>
  <boardID>1234-1234-1234-1234</boardID>
  <board>##1..</board>
  <board>##1..</board>
  <board>221..</board>
  <board>#1...</board>
  <flagsRemaining>3</flagsRemaining>
</MineRequest>

XML Example Response

<?xml version="1.0" encoding="UTF-8"?>
<MineResponse>
  <moves>
    <x>1</x>
    <y>1</y>
    <flag>true</flag>
  </moves>
</MineResponse>

Twisty Puzzles

Twisty puzzles are a subset of combination puzzles, where the AI has to unscramble a three-dimensional puzzle by rotating (or twisting) the sides until each side is a solid block of colour. Most famously these include the Rubix Cube, (invented by Ernő Rubik in 1974,) a cube made of 26 smaller cubes that rotate around a central point; but the family of puzzles have expanded to include all sorts of shapes.

Notation and Scoring

A lot of confusion for these puzzles is generated by the labelling of the moves and the different scoring systems available. Here (as far as possible) we have adopted the World Cube Association standards as set out in the WCA regulations, (section 12) and use the outer block turn metric (OBTM) to count moves.

Notation for NxNxN Cubes (From WCA):

Note – Although the WCA notes that dropping the “2” prefix for outer block moves when n=2 is optional, here it is done in all cases.

Algorithms and Hints

Although in theory all twisty puzzles could be solved by a simple search algorithm, (as all future states can be accurately predicted) in reality all but the smallest puzzles have far too many possible combinations actually do this.

Puzzle Type Positions (Possible States)
Cube 2x2x2 3,674,160
Cube 3x3x3 43,252,003,274,489,856,000
Cube 4x4x4 7.4 × 1045
Cube 5x5x5 2.8 × 1074
Cube 6x6x6 1.6 × 10116
Cube 7x7x7 2.0 × 10160

Instead, you could try splitting the puzzle into several smaller tasks (i.e. top layer, sides, bottom corners, whole cube) with the solution to each sub-puzzle keeping the previous state intact.

Alternatively, it has been shown that by using a deep neural network to predict how many moves a state is away from a solution, existing search algorithms (such as A*) can be adapted to produce good results.

API Interface

The Swagger specification for this API can be downloaded here.

The API is implemented as a single POST action which contains:

The response should be the desired move, expressed as a string.

Making multiple moves

To make multiple moves in the same network response, separate the moves by spaces. For example:

<move>R D' R' U2</move>

The history object of the next request will only show the last move made.

State Encoding

The state of the puzzle is encoded as a string, with each character representing the colour of a single face.

Character Colour
‘W’ White
‘O’ Orange
‘G’ Green
‘R’ Red
‘B’ Blue
‘Y’ Yellow

Each face is encoded row by row, from top left to bottom right.

Character encoding of a 2x2x2 Cube

Character encoding of a 3x3x3 Cube

JSON Example Request

{
"history": {
"startState": "OYWYYBRYOYBYRRRWRRBRYGGYGGRGOBGOOYBGOOBGBBOORWWGWWWBWW",
"moves": "D'",
"endState": "OYWYYBRYOYBYRRRGGRBRYGGYYBGGOBGOOOOROOBGBBWRRGWWWWWWWB",
"success": false
},
"puzzleType": "Cube 3x3x3 (OTBM)",
"moves": [
"R", "L", "U","D","R'","L'"
],
"state": "OYWYYBRYOYBYRRRGGRBRYGGYYBGGOBGOOOOROOBGBBWRRGWWWWWWWB"
}

###JSON Example Response

{
"move": "R D R'"
}

XML Example Request

<?xml version="1.0" encoding="UTF-8"?>
<TwistyRequest>
 <history>
<startState>OYWYYBRYOYBYRRRWRRBRYGGYGGRGOBGOOYBGOOBGBBOORWWGWWWBWW</startState>
  <moves>D'</moves>
<endState>OYWYYBRYOYBYRRRGGRBRYGGYYBGGOBGOOOOROOBGBBWRRGWWWWWWWB</endState>
  <success>false</success>
</history>
<puzzleType>Cube 3x3x3 (OTBM)</puzzleType>
<moves>R</moves>
<moves>L</moves>
<moves>U</moves>
<moves>D</moves>
<moves>R'</moves>
<moves>L'</moves>
<moves>U'</moves>
<moves>D'</moves>
<state>OYWYYBRYOYBYRRRGGRBRYGGYYBGGOBGOOOOROOBGBBWRRGWWWWWWWB</state>
</TwistyRequest>

XML Example Response

<?xml version="1.0" encoding="UTF-8"?>
<TwistyResponse>
	<move>R D R'</move>
</TwistyResponse>