https://github.com/jvmncs/safe-grid-agents
Training (hopefully) safe agents in gridworlds
https://github.com/jvmncs/safe-grid-agents
gridworld gym reinforcement-learning safe-agents safe-reinforcement-learning
Last synced: about 2 months ago
JSON representation
Training (hopefully) safe agents in gridworlds
- Host: GitHub
- URL: https://github.com/jvmncs/safe-grid-agents
- Owner: jvmncs
- License: apache-2.0
- Created: 2018-04-21T19:00:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-05-12T20:14:20.000Z (over 6 years ago)
- Last Synced: 2025-05-01T14:37:00.355Z (5 months ago)
- Topics: gridworld, gym, reinforcement-learning, safe-agents, safe-reinforcement-learning
- Language: Python
- Homepage:
- Size: 222 KB
- Stars: 25
- Watchers: 7
- Forks: 4
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# safe-grid-agents
Training (hopefully) safe agents in gridworlds.
Emphasizing extensibility, modularity, and accessibility.
## Layout
- `safe_grid_agents/common`: Core codebase. Includes abstract base
classes for a variety of agents, their associated warmup/learn/eval
functions, and a utilities file.
- `main.py`: Python executable for composing training jobs.
- `safe_grid_agents/parsing`: Helpers that construct a flexible CLI
for `main.py`.
- `safe_grid_agents/ssrl`: Agents that implement semi-supervised
reinforcement learning and their associated warmup functions.## Installation
When installing with pip, make sure to use the
`process-dependency-links` flag:``` {.sh}
pip install . --process-dependency-links
```URL-based dependencies are available for audit at the following
repositories and forks: -
[safe-grid-gym](https://github.com/jvmancuso/safe-grid-gym) -
[ai-safety-gridworlds](https://github.com/jvmancuso/ai-safety-gridworlds)If you plan on developing this library, make sure to add an `-e` flag to
the above pip install command.This repo requires [tensorboardX](https://github.com/lanpa/tensorboardX)
for monitoring and visualizing agent learning, as well as PyTorch for
implementation of certain agents. Currently, tensorboardX does not
function properly without Tensorflow installed. Since the installation
process of these packages can vary system to system, we exclude them
from our build process. There are multiple tutorials online for
installing both of these online. For example, on OS X without CUDA
support I'd go with:``` {.sh}
# Replace `tensorflow` with `tensorflow-gpu` if you have a GPU.
pip install torch torchvision tensorflow
```# Usage
## Training agents
You can use the CLI to `main.py` to modularly drop agents into arbitrary
safety gridworlds. For example, `python main.py boat tabular-q --lr .5`
will train a TabularQAgent on the BoatRaceEnvironment with a learning
rate of 0.5.There are a number of customizable parameters to modify training runs.
These parameters are split into three groups: - Core arguments: args
that are shared across all agents/environments. Found in
[`parsing/core_parser_configs.yaml`](https://github.com/jvmancuso/safe-grid-agents/blob/master/safe_grid_agents/parsing/core_parser_configs.yaml).
- Environment arguments: args specific to environments but shared across
agents. Currently empty, but could be useful for specific environments,
depending on the agent. Found in
[`parsing/env_parser_configs.yaml`](https://github.com/jvmancuso/safe-grid-agents/blob/master/safe_grid_agents/parsing/env_parser_configs.yaml).
- Agent environments: args specific to agents. Most hyperparameters live
here. Found in
[`parsing/agent_parser_configs.yaml`](https://github.com/jvmancuso/safe-grid-agents/blob/master/safe_grid_agents/parsing/agent_parser_configs.yaml).The generalized form for the CLI is
``` {.sh}
python main.py env agent
```## Ray Tune
We support using Ray Tune to configure hyperparameters. Look at
`TUNE_DEFAULT_CONFIG` in `main.py` to see which are currently supported.
If you specify a tunable parameter on the CLI with the `-t` or `--tune`
flag, it will be automatically set.### Example
This will automatically set parameters for the learning rate `lr` and
discount rate `discount`.``` {.sh}
# `-t` and `--tune` are equivalent, and can be used interchangeably.
python3 main.py -t lr --tune discount boat tabular-q
```## Monitoring agent learning with tensorboardX
You can use the `--log-dir`/`-L` flag to the main.py script to specify a
directory for saving training and evaluation metrics across runs. I
suggest a pattern similar to``` {.sh}
logs/sokoban/deep-q/lr5e-4
# that is, ///
```If no log-dir is specified for main.py, logging defaults to the `runs/`
directory, which can be helpful to separate debugging runs from training
runs.Given a log directory ``, simply run `tensorboard --logdir `
to visualize an agent's learning.# Development
## Code style
We use [black](https://github.com/ambv/black) for auto-formatting
according to a consistent style guide. To auto format, run `black .`
from inside the repo folder. To make this more convenient, you can
install plugins for your preferred text editor that auto-format on every
save.## Adding agents
Steps to take when adding a new agent.
1. Determine where the agent should live; for example, if you're
testing a new baseline from standard RL, include it in `common`, but
if you're adding a new SSRL agent, add it to `ssrl`. We'll refer to
this folder as ``.
2. (optional) If your agent doesn't fall into these categories, create
a new top-level subdirectory `` for it (using an informative
abbreviation). You should also create an abstract base class
establishing the distinguishing functionality of your agent class in
`/base.py`. For example:
- SSRL requires a stronger agent H to learn from, so we require a
`query_H` method for each agent.
- Additionally, following [Everitt et
al.](https://arxiv.org/abs/1705.08417), we require a `learn_C`
method to learn the probability of the state being corrupt.
3. (optional) Implement a warmup function in `/warmup.py`, and
make sure it's importable from `common/warmup.py`. The `noop`
default warmup function works for agents that don't require any
special functionality.
4. Implement a function describing the agent's learning feedback loop
in `/learn.py`. See
[`common/learn.py`](https://github.com/jvmancuso/safe-grid-agents/blob/master/safe_grid_agents/common/learn.py)
for an example distinguishing DQN from a tabular Q-learning agent.
5. (optional) Implement a function in `/eval.py` describing the
evaluation feedback loop. The `default_eval` function in
`common/eval.py` should cover most cases, so you may not need to add
anything for evaluation.
6. Add a new entry for the agent's CLI arguments in
`parsing/agent_parser_configs.yaml`. Follow the existing pattern and
check for previously implemented YAML anchors that cover the
arguments you need (e.g. `learnrate`, `epsilon-anneal`, etc.). These
configs should be organized by where they appear in the folder
structure of the repository.