Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/young-geng/cql
Conservative Q Learning on top of SAC
https://github.com/young-geng/cql
pytorch reinforcement-learning
Last synced: 11 days ago
JSON representation
Conservative Q Learning on top of SAC
- Host: GitHub
- URL: https://github.com/young-geng/cql
- Owner: young-geng
- License: mit
- Created: 2021-08-17T23:51:08.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-10-15T00:47:50.000Z (about 2 years ago)
- Last Synced: 2023-08-12T07:58:50.013Z (over 1 year ago)
- Topics: pytorch, reinforcement-learning
- Language: Python
- Homepage:
- Size: 1.05 MB
- Stars: 90
- Watchers: 5
- Forks: 19
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CQL
A simple and modular implementation of the [Conservative Q Learning](https://arxiv.org/abs/2006.04779) and [Soft Actor Critic](https://arxiv.org/abs/1812.05905) algorithm in PyTorch.If you like Jax, checkout my [reimplementation of this codebase in Jax](https://github.com/young-geng/JaxCQL), which runs 4 times faster.
## Installation
1. Install and use the included Ananconda environment
```
$ conda env create -f environment.yml
$ source activate SimpleSAC
```
You'll need to [get your own MuJoCo key](https://www.roboti.us/license.html) if you want to use MuJoCo.2. Add this repo directory to your `PYTHONPATH` environment variable.
```
export PYTHONPATH="$PYTHONPATH:$(pwd)"
```## Run Experiments
You can run SAC experiments using the following command:
```
python -m SimpleSAC.sac_main \
--env 'HalfCheetah-v2' \
--logging.output_dir './experiment_output'
```
All available command options can be seen in SimpleSAC/conservative\_sac_main.py and SimpleSAC/conservative_sac.py.You can run CQL experiments using the following command:
```
python -m SimpleSAC.conservative_sac_main \
--env 'halfcheetah-medium-v0' \
--logging.output_dir './experiment_output'
```If you want to run on CPU only, just add the `--device='cpu'` option.
All available command options can be seen in SimpleSAC/sac_main.py and SimpleSAC/sac.py.## Visualize Experiments
You can visualize the experiment metrics with viskit:
```
python -m viskit './experiment_output'
```
and simply navigate to [http://localhost:5000/](http://localhost:5000/)## Weights and Biases Online Visualization Integration
This codebase can also log to [W&B online visualization platform](https://wandb.ai/site). To log to W&B, you first need to set your W&B API key environment variable:
```
export WANDB_API_KEY='YOUR W&B API KEY HERE'
```
Then you can run experiments with W&B logging turned on:
```
python -m SimpleSAC.conservative_sac_main \
--env 'halfcheetah-medium-v0' \
--logging.output_dir './experiment_output' \
--device='cuda' \
--logging.online
```## Results of Running CQL on D4RL Environments
In order to save your time and compute resources, I've done a sweep of CQL on certain
D4RL environments with various min Q weight values. [The results can be seen here](https://wandb.ai/ygx/CQL--cql_min_q_weight_sweep_1).
You can choose the environment to visualize by filtering on `env`. The results for each `cql.cql_min_q_weight` on each `env`
is repeated and average across 3 random seeds.## Credits
The project organization is inspired by [TD3](https://github.com/sfujim/TD3).
The SAC implementation is based on [rlkit](https://github.com/vitchyr/rlkit).
THe CQL implementation is based on [CQL](https://github.com/aviralkumar2907/CQL).
The viskit visualization is taken from [viskit](https://github.com/vitchyr/viskit), which is taken from [rllab](https://github.com/rll/rllab).