https://github.com/geyang/model-free
A getting started repo for state-of-the-art model-free rl algorithms
https://github.com/geyang/model-free
Last synced: 3 months ago
JSON representation
A getting started repo for state-of-the-art model-free rl algorithms
- Host: GitHub
- URL: https://github.com/geyang/model-free
- Owner: geyang
- License: mit
- Created: 2022-12-23T19:47:09.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-16T17:21:59.000Z (over 2 years ago)
- Last Synced: 2025-01-10T12:58:28.035Z (5 months ago)
- Language: Jupyter Notebook
- Size: 32.7 MB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Model-free RL Baselines
This codebase combines DrQv2, PPO, and RFF-DrQ into a single codebase. DrQ is closer to SAC, whereas DrQv2 is DDPG plus tricks from TD3. Both were originally implemented for control-from-pixels. On state space, DrQ works better than DrQv2 because the noise helps with exploration. Denis' state-space SAC implementation is the SOTA, and is better than either DrQ and DrQv2 on state-space DeepMind control tasks.
Therefore, for state-space control tasks, you want to start with sac_denis_rff. We actually want a clean version of `sac_denis` without the rff stuff, but that's not implemented yet.
**On-policy methods** policy gradient methods usually have faster wall-clock time that actor-critic methods. For Isaac gym environments, we use PPO because the fast sampling performance makes it less important to be sample efficient (the actor-critic SOTA results evaluate sample-efficiency, NOT wall-clock time). Even so, it is known that sample-efficient SAC implementations with Isaac Gym can improve the wall-clock time upon PPO by about 20% - 30%, depending on the domain.
For this reason we want to also add an SAC + Isaac Gym implementation. Currently, very few work in locomotion look at off-policy methods. HER is completely missing. The few exceptions are
- Jason Peng's goal shooting paper, which uses an off-policy algorithm for the high-level policy
- Ilya Kostrikov's learning-in-the-real-world paper -- Sergey Levine commented the main reason such complicated algorithm worked on the robot is that Ilya is the person who implemented it.## Algorithms
This repository contains implementations of the following papers in a unified framework:
- [PPO (Schulman et al., 2022)](https://arxiv.org/abs/1707.06347)
- [SAC (Haarnoja et al., 2018)](https://arxiv.org/abs/1812.05905)
- [DrQv2 (Yarats et al., 2022)](https://arxiv.org/abs/2107.09645)using standardized architecture and hyper-parameters, should attain SOTA.
## Setup
All launching and analysis code resides in the `model_free_analysis` sub folder. The `.jaynes` configuration files is under project root.
Each experiment will occupy one sub folder. ML-Logger will automatically log according to this folder structure. This logic is implemented in [./rl_transfer_analysis/__init__.py](./rl_transfer_analysis/__init__.py).
## ML-Logger Env Setup
1. Install `ml-logger`.
```bash
pip install ml-logger
```2. Add the following environment variables to your `~/.bashrc`
```bash
export ML_LOGGER_ROOT=http://:8080
export ML_LOGGER_USER=$USER
export ML_LOGGER_TOKEN=
```## Launching via SSH and Docker (on the VisionGPU Cluster)
1. **Update `jaynes`, `ml-logger`, and `params-proto` to the latest version.
```bash
pip install jaynes ml-logger params-proto
```2. add `NFS_PATH=/data/whatever/misc/$USER` to your `.bashrc` file. We use this parameter in the `.jaynes.config`.
```bash
echo "export NFS_PATH=/data/whatever/misc/$USER" >> ~/.bashrc
```3. Install `aws-cli` using the following command:
```bash
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install -i $NFS_PATH/aws-cli -b $NFS_PATH/bin
echo "export PATH=$NFS_PATH/bin:$PATH" >> ~/.bashrc
```After this, you should be able to run
```bash
aws info
```4. Adding your `aws` credentials, so that you could use the `s3` copy command
```bash
aws configure
```If `aws s3 cp ` is already working, you can skip 2 and 3.
Now if you run with
```python
jaynes.configure('visiongpu')
jaynes.run(train_fn)
jaynes.listen()
```it should correctly package and launch on the vision gpu cluster.
## Training & Evaluation
**This is stale, needs to be updated** The `scripts` directory contains training and evaluation bash scripts for all the included algorithms. Alternatively, you can call the python scripts directly, e.g. for training call
```
python3 src/train.py \
--algorithm soda \
--aux_lr 3e-4 \
--seed 0
```to run SODA on the default task, `walker_walk`. This should give you an output of the form:
```
Working directory: logs/walker_walk/soda/0
Evaluating: logs/walker_walk/soda/0
| eval | S: 0 | ER: 26.2285 | ERTEST: 25.3730
| train | E: 1 | S: 250 | D: 70.1 s | R: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | AUXLOSS: 0.0000
```
where `ER` and `ERTEST` corresponds to the average return in the training and test environments, respectively. You can select the test environment used in evaluation with the `--eval_mode` argument, which accepts one of `(train, color_easy, color_hard, video_easy, video_hard)`.## Results
Work-in-progress
## Acknowledgements
We want to thank the numerous researchers and engineers involved in work of which this implementation is based on. This benchmark is a product of our work on , our SAC implementation is based on [this repository](https://github.com/denisyarats/pytorch_sac_ae), the original DMControl is available [here](https://github.com/deepmind/dm_control), and the gym wrapper for it is available [here](https://github.com/denisyarats/dmc2gym). RAD, and CURL baselines are based on their official implementations provided [here](https://github.com/nicklashansen/policy-adaptation-during-deployment), [here](https://github.com/MishaLaskin/rad), and [here](https://github.com/MishaLaskin/curl), respectively.