Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eczy/box-world-mavischer
https://github.com/eczy/box-world-mavischer
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/eczy/box-world-mavischer
- Owner: eczy
- Created: 2021-05-19T23:28:17.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2021-05-19T23:31:55.000Z (over 3 years ago)
- Last Synced: 2024-10-27T17:37:57.617Z (3 months ago)
- Language: Python
- Size: 717 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Box-World
## Introduction
Gym implementation of the Box-World environment from the paper "Relational Deep Reinforcement Learning" (https://arxiv.org/pdf/1806.01830.pdf), which is made to explicitly target relational reasoning.
| Example Game 1 | Example Game 2 | Example Game 3 |
| :---: | :---: | :---:
| ![Game 1](./examples/round_1.gif?raw=true) | ![Game 2](./examples/round_2.gif?raw=true) | ![Game 3](./examples/round_0.gif?raw=true) |It is a perceptually simple but combinatorially complex environment that requires abstract relational reasoning and planning. It consists of a n × n pixel room with keys and boxes randomly scattered. The room also contains an agent, represented by a single dark gray pixel, which can move in four directions: up, down, left, right. Keys are represented by a single colored pixel. The agent can pick up a loose key (i.e., one not adjacent to any other colored pixel) by walking over it. Boxes are represented by two adjacent colored pixels – the pixel on the right represents the box’s lock and its color indicates which key can be used to open that lock; the pixel on the left indicates the content of the box which is inaccessible while the box is locked.
To collect the content of a box the agent must first collect the key that opens the box (the one that matches the lock’s color) and walk over the lock, which makes the lock disappear. At this point, the content of the box becomes accessible and can be picked up by the agent. Most boxes contain keys that, if made accessible, can be used to open other boxes. One of the boxes contains a gem, represented by a single white pixel. The goal of the agent is to collect the gem by unlocking the box that contains it and picking it up by walking over it. The key that an agent has in possession is depicted in the input observation as a pixel in the top-left corner. In each level, there is a unique sequence of boxes that need to be opened to reach the gem. Opening one wrong box (a distractor box) leads to a dead-end where the gem cannot be reached and the level becomes unsolvable.
Four user-controlled parameters contribute to the difficulty of the level:
- The size of the board, thus size of state space
- The number of boxes in the path to the goal (solution length)
- The number of distractor branches
- The length of the distractor branchesIn general, the task is computationally difficult for a few reasons. First, a key can only be used once, so the agent must be able to reason about whether a particular box is along a distractor branch or the solution path. Second, keys and boxes appear in random locations in the room, emphasizing a capacity to reason about keys and boxes based on their abstract relations, rather than based on their spatial positions.
## Actions
The game provides 4 actions to interact with the environment.
The mapping of the action numbers to the actual actions looks as follows| Action | ID |
| -------- | :---: |
| Move Up | 0 |
| Move Down | 1 |
| Move Left | 2 |
| Move Right | 3 |
## Reward Structure
Under default settings, the environment provides the rewards as given in the paper:
* 0 for every step (can be set to e.g. -0.05 to facilitate learning)
* 1 for picking up (not unlocking) a correct key
* -1 for picking up (not unlocking) a wrong key
* 0 for reaching a dead end / "dying"
* 10 for reaching the goal (gem)All of these values can be specified when instantiating or gym.make()-ing an environment.
## Installing the Environment
This implementation makes use of gym registration.
Install the environment by running inside this directory:
```bash
pip install -e gym-boxworld
```
**Remember that after changing the code you need to re-register the environment before the changes become effective.**An instance can be then created by:
```python
import gym
gym.make('gym_boxworld:boxworld-v0', **kwargs)
```
## Quick Game
```bash
python Human_playing_Commandline.py --gifs
```