https://github.com/dcbr/sdab
This repository provides the necessary code to reproduce all supplementary experiments of the "Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies" paper [1, Appendix B].
https://github.com/dcbr/sdab
Last synced: 5 months ago
JSON representation
This repository provides the necessary code to reproduce all supplementary experiments of the "Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies" paper [1, Appendix B].
- Host: GitHub
- URL: https://github.com/dcbr/sdab
- Owner: dcbr
- License: bsd-3-clause
- Created: 2022-06-21T18:37:34.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2022-06-21T18:47:01.000Z (almost 4 years ago)
- Last Synced: 2025-09-05T08:41:01.147Z (10 months ago)
- Language: Python
- Size: 30.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.bib
Awesome Lists containing this project
README
# State-dependent action bounds
This repository provides the necessary code to reproduce all supplementary experiments of the *"Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies"* paper [1, Appendix B].
An example implementation of state-dependent action bounds is provided for the SAC method, using [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) and [Pytorch](https://pytorch.org).
Custom state-dependent action bounds are defined for the `Pendulum-v1` and `LunarLanderContinuous-v2` [OpenAI gym environments](https://gymlibrary.ml).
Refer to the paper's supplementary material for further details.
# Installation
1. Clone this repository.
``git clone https://github.com/dcbr/sdab``
``cd sdab``
2. Install the required packages.
Optionally, create a virtual environment first (using e.g. conda or venv).
``python -m pip install -r requirements.txt``
# Usage
Run the `action_bounds` script with suitable arguments to train the models or evaluate and analyze their performance.
For example
``python action_bounds.py --mode train --envs LunarLanderContinuous-v2 --rescale lin hyp``
to train on the lunar lander environment (with stabilizing action bounds) for both the linear and hyperbolic rescaling function.
To reproduce all results of Appendix B, first train all models with ``python action_bounds.py``, followed by the analysis ``python action_bounds.py --mode analyze``. Beware that this might take a while to complete, depending on your hardware!
A summary of the most relevant parameters to this script is provided below.
Check ``python action_bounds.py --help`` for a full overview of supported parameters.
| Parameter | Supported values | Description |
|:------------|:------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `--mode` | `train`, `eval`, `analyze` | Run mode. Either train models, evaluate (and visualize) them or analyze and summarize the results (creating the plots shown in the paper). |
| `--envs` | `Pendulum-v1`, `LunarLanderContinuous-v2` | OpenAI gym environment ID. |
| `--algs` | `sac`, `bsac` | Reinforcement learning algorithm to use. Either the bounded SAC algorithm (`bsac`), with enforced state-dependent action bounds, or the default SAC algorithm (`sac`), without enforcement of such bounds. |
| `--rescale` | `lin`, `pwl`, `hyp`, `clip` | Rescaling function σ to use. Either linear (`lin`), piecewise linear (`pwl`) or hyperbolic (`hyp`) rescaling; or clipping (`clip`). |
| `--seeds` | Any integer number *N* | Experiments are repeated for all of the provided seeds. Can also be a negative number *-N* in which case *N* seeds are randomly chosen. |
# References
[1] De Cooman, B., Suykens, J., Ortseifen, A.: Enforcing hard state-dependent action bounds on deep reinforcement learning policies. Accepted for *8th International Conference on Machine Learning, Optimization & Data Science, LOD 2022*.