https://github.com/juliareinforcementlearning/gridworlds.jl

Help! I'm lost in the flatland!
https://github.com/juliareinforcementlearning/gridworlds.jl

grid-world gridworld gridworld-environment hacktoberfest julia makie reinforcement-learning

Last synced: 2 months ago
JSON representation

Help! I'm lost in the flatland!

Host: GitHub
URL: https://github.com/juliareinforcementlearning/gridworlds.jl
Owner: JuliaReinforcementLearning
License: mit
Created: 2020-08-08T11:04:53.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2023-06-25T18:51:45.000Z (about 2 years ago)
Last Synced: 2025-04-23T11:23:32.690Z (3 months ago)
Topics: grid-world, gridworld, gridworld-environment, hacktoberfest, julia, makie, reinforcement-learning
Language: Julia
Homepage:
Size: 34.7 MB
Stars: 47
Watchers: 7
Forks: 9
Open Issues: 13
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.bib

Awesome Lists containing this project

README

# GridWorlds

A package for creating grid world environments for reinforcement learning in Julia. This package is designed to be lightweight and fast.

This package is inspired by [gym-minigrid](https://github.com/maximecb/gym-minigrid). In order to cite this package, please refer to the file `CITATION.bib`. Starring the repository on GitHub is also appreciated. For benchmarks, refer to `benchmarks/benchmarks.md`.

## Table of contents:

* [Getting Started](#getting-started)
* [Notes](#notes)

[List of Environments](#list-of-environments)
1. [SingleRoomUndirected](#singleroomundirected)
1. [SingleRoomDirected](#singleroomdirected)
1. [GridRoomsUndirected](#gridroomsundirected)
1. [GridRoomsDirected](#gridroomsdirected)
1. [SequentialRoomsUndirected](#sequentialroomsundirected)
1. [SequentialRoomsDirected](#sequentialroomsdirected)
1. [MazeUndirected](#mazeundirected)
1. [MazeDirected](#mazedirected)
1. [GoToTargetUndirected](#gototargetundirected)
1. [GoToTargetDirected](#gototargetdirected)
1. [DoorKeyUndirected](#doorkeyundirected)
1. [DoorKeyDirected](#doorkeydirected)
1. [CollectGemsUndirected](#collectgemsundirected)
1. [CollectGemsDirected](#collectgemsdirected)
1. [CollectGemsMultiAgentUndirected](#collectgemsmultiagentundirected)
1. [DynamicObstaclesUndirected](#dynamicobstaclesundirected)
1. [DynamicObstaclesDirected](#dynamicobstaclesdirected)
1. [SokobanUndirected](#sokobanundirected)
1. [SokobanDirected](#sokobandirected)
1. [Snake](#snake)
1. [Catcher](#catcher)
1. [TransportUndirected](#transportundirected)
1. [TransportDirected](#transportdirected)

## Getting Started

```julia
import GridWorlds as GW

# Each environment `Env` lives in its own module `EnvModule`
# For example, the `SingleRoomUndirected` environment lives inside the `SingleRoomUndirectedModule` module

env = GW.SingleRoomUndirectedModule.SingleRoomUndirected()

# reset the environment. All environments are randomized

GW.reset!(env)

# get names of actions that can be performed in this environment

GW.get_action_names(env)

# perform actions in the environment

GW.act!(env, 1) # move up
GW.act!(env, 2) # move down
GW.act!(env, 3) # move left
GW.act!(env, 4) # move right

# play an environment interactively inside the terminal

GW.play!(env)

# play and record the interaction in a file called recording.txt

GW.play!(env, file_name = "recording.txt")

# manually step through the frames in the recording

GW.replay(file_name = "recording.txt")

# replay the recording inside the terminal at a given frame rate

GW.replay(file_name = "recording.txt", frame_rate = 2)

# use the RLBase API

import ReinforcementLearningBase as RLBase

# wrap a game instance from this package to create an RLBase compatible environment

rlbase_env = GW.RLBaseEnv(env)

# perform RLBase operations on the wrapped environment

RLBase.reset!(rlbase_env)
state = RLBase.state(rlbase_env)
action_space = RLBase.action_space(rlbase_env)
reward = RLBase.reward(rlbase_env)
done = RLBase.is_terminated(rlbase_env)

rlbase_env(1) # move up
rlbase_env(2) # move down
rlbase_env(3) # move left
rlbase_env(4) # move right
```

## Notes

### Reinforcement Learning

This package does not intend to reinvent a fully usable reinforcement learning API. Instead, all the games in this package provide the bare minimum of what is needed to for the game logic, which is the ability to reset an environment using `GW.reset!(env)` and to perform actions in the environment using `GW.act!(env, action)`. In order to utilize such a game for reinforcement learning, you would probably be using a higher level reinforcement learning API like the one offered by the `ReinforcementLearning.jl` package (`RLBase` API), for example. As of this writing, all the environments provide a default implementation for the `RLBase` API, which means that you can easily wrap a game from `GridWorlds.jl` and use it directly with the rest of the `ReinforcementLearning.jl` ecosystem.

1. ### States

There are a few possible options for representing the state/observation for an environment. You can use the entire tile map. You can also augment that with other environment specific information like the agent's direction, target (in `GoToTargetUndirected`) etc. In several games, you can also use the `GW.get_sub_tile_map!` function to get a partial view of the tile map to be used as the observation.

All environemnts provide a default implementation of the `RLBase.state` function. It is recommended that before performing reinforcement learning experiments using an environment, you carefully understand the information contained in the state representation for that environment.

1. ### Actions

As of this writing, all actions in all environments are discrete. And so, to keep things simple and consistent, they are represented by elements of `Base.OneTo(NUM_ACTIONS)` (basically integers going from 1 to NUM_ACTIONS). In order to know which action does what, you can call `GW.get_action_names(env)` to get a list of names which gives a better description. For example:

```julia
julia> env = GW.SingleRoomUndirectedModule.SingleRoomUndirected();

julia> GW.get_action_names(env)
(:MOVE_UP, :MOVE_DOWN, :MOVE_LEFT, :MOVE_RIGHT)
```

The order of elements in this list corresponds to that of the actions.

1. ### Rewards and Termination

As mentioned before, in order to use these for reinforcement learning experiments, you would mostly be using a higher level API like `RLBase`, which should already provide a way to get these values. For example, in RLBase, rewards can be accessed using `RLBase.reward(env)` and checking whether an environment has terminated or not can by done by calling `RLBase.is_terminated(env)`. In case you are using some other API and need more direct control, it is better to take a look at the implementation for that environment to access things like reward and check for termination.

### Tile Map

Each environment contains a tile map, which is a `BitArray{3}` that encodes information about the presence or absence of objects in the grid world. It is of size `(num_objects, height, width)`. The second and third dimensions correspond to positions along the height and width of the tile map. The first dimension corresponds to the presence or absence of objects at a particular position using a multi-hot encoding along the first dimension. You can get the name and ordering of objects along the first dimension of the tile map by using the following method:

```julia
julia> env = GW.SingleRoomUndirectedModule.SingleRoomUndirected();

julia> GW.get_object_names(env)
(:AGENT, :WALL, :GOAL)
```

### Navigation

Several environments contain the word `Undirected` or `Directed` within their name. This refers to the navigation style of the agent. `Undirected` means that the agent has no direction associated with it, and navigates around by directly moving up, down, left, or right on the tile map. `Directed` means that the agent has a direction associated with it, and it navigates around by moving forward or backward along its current direction, or it could also turn left or right with respect to its current direction. There are 4 directions - `UP`, `DOWN`, `LEFT`, and `RIGHT`.

### Interactive Playing and Recording

All the environments can be played directly inside the REPL. These interactive sessions can also be recorded in plain text files and replayed in the terminal. There are two ways to replay a recording:
1. The default way is to manually step through each recorded frame. This allows you to move through the frames one by one at your own pace using keyboard inputs.
1. The second way is to replay the frames at a given frame rate. This would loop through all the frames once and then (and only then) exit the replay.

Here is an example:

### Programmatic Recording of Agent's Behavior

In order to programmatically record the behavior of an agent during an episode, you can simply log the string representation of the environment at each step prefixed with a delimiter. You can also log other arbitrary information if you want, like the total reward so far, for example. You can then use the `GW.replay` functiton to replay the recording inside the terminal. The string representation of an environment can be obtained using `repr(MIME"text/plain"(), env)`. Here is an example:

```julia
import GridWorlds as GW
import ReinforcementLearningBase as RLBase

game = GW.SingleRoomUndirectedModule.SingleRoomUndirected()
env = GW.RLBaseEnv(game)
frame_start_delimiter = "SOME_FRAME_START_DELIMITER"

total_reward = zero(RLBase.reward(env))
frame_number = 1

str = ""

str = str * frame_start_delimiter
str = str * "frame_number: $(frame_number)\n"
str = str * repr(MIME"text/plain"(), env)
str = str * "\ntotal_reward: $(total_reward)"

while !RLBase.is_terminated(env)
action = rand(RLBase.action_space(env))
env(action)
reward = RLBase.reward(env)

global total_reward += reward
global frame_number += 1

global str = str * frame_start_delimiter
global str = str * "frame_number: $(frame_number)\n"
global str = str * repr(MIME"text/plain"(), env)
global str = str * "\ntotal_reward: $(total_reward)"
end

write("recording.txt", str)

GW.replay(file_name = "recording.txt", frame_start_delimiter = frame_start_delimiter)
```

In `ReinforcementLearning.jl`, you can create a [hook](https://juliareinforcementlearning.org/docs/How_to_use_hooks/) for recording the agent's behavior at any point during training.

## List of Environments

1. ### SingleRoomUndirected