Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dyth/doublegum

NeurIPS 2023 Spotlight
https://github.com/dyth/doublegum

Last synced: about 2 months ago
JSON representation

NeurIPS 2023 Spotlight

Host: GitHub
URL: https://github.com/dyth/doublegum
Owner: dyth
License: bsd-3-clause
Created: 2023-10-06T11:19:56.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-04-27T20:17:14.000Z (8 months ago)
Last Synced: 2024-04-28T21:25:19.479Z (8 months ago)
Language: Python
Homepage: https://openreview.net/forum?id=UdaTyy0BNB
Size: 98.6 KB
Stars: 9
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # DoubleGum

Code for *Double Gumbel Q-Learning*

Data (5.4 MB): [https://drive.google.com/file/d/12wyYZ92bvVdkEQIHms8mVR5zYJZue-cd/view?usp=sharing](https://drive.google.com/file/d/12wyYZ92bvVdkEQIHms8mVR5zYJZue-cd/view?usp=sharing)

Logs (4.21 GB): [https://drive.google.com/file/d/1LpR3lrKUx-qTaCrI4YViAjc0QA5kb8P2/view?usp=sharing](https://drive.google.com/file/d/1LpR3lrKUx-qTaCrI4YViAjc0QA5kb8P2/view?usp=sharing)

## Installation

On `Python 3.9` with `Cuda 12.2.1` and `cudnn 8.8.0`.

```commandline

git clone [email protected]:dyth/doublegum.git

cd doublegum

```

create virtualenv

```

virtualenv /doublegum

source /doublegum

```

or conda

```commandline

conda create --name doublegum python=3.9

conda activate doublegum

```

install mujoco

```commandline

mkdir .mujoco

cd .mujoco

wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz

tar -xf mujoco210-linux-x86_64.tar.gz

```

install packages

```commandline

pip install -r requirements.txt

pip install "jax[cuda12_pip]==0.4.14" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

```

test that the code runs

```commandline

./test.sh

```

## Continuous Control

```commandline

main_cont.py --env  --policy 

```

MetaWorld `env`s are run with `--env MetaWorld_`

Policies benchmarked in our paper were:

* `DoubleGum`: DoubleGum (our algorithm)

* `DDPG`: DDPG (Deep Deterministic Policy Gradients), [[Lilicrap et al., 2015](https://arxiv.org/abs/1509.02971)]

* `TD3`: TD3 (Twin Delayed DDPG), [[Fujimoto et al., 2018](https://proceedings.mlr.press/v80/fujimoto18a.html)]

* `SAC`: SAC (Soft Actor Critic, defaults to use Twin Critics), [[Haarnoja et al., 2018](https://arxiv.org/abs/1812.05905)]

* `XQL --ensemble 1`: XQL (Extreme Q-Learning), [[Garg et al., 2023](https://openreview.net/forum?id=SJ0Lde3tRL)]

* `MoG-DDPG`: MoG-DDPG (Mixture of Gaussians Critics DDPG), [[Barth-Maron et al., 2018](https://openreview.net/forum?id=SyZipzbCb), [Shariari et al, 2022](https://arxiv.org/abs/2204.10256)]

Policies we created/modified as additional benchmarks were:

* `QR-DDPG`: QR-DDPG (Quantile Regression [[Dabney et al., 2018](https://ojs.aaai.org/index.php/AAAI/article/view/11791)] with DDPG, defaults to use Twin Critics)

* `QR-DDPG --ensemble 1`: QR-DDPG without Twin Critics

* `SAC --ensemble 1`: SAC without Twin Critics

* `XQL`: XQL with Twin Critics

* `TD3 --ensemble 5 --pessimism 
`: Finer TD3, where p is an integer between 0 and 4

Policies included in this repository but not benchmarked in our paper were:

* `IQL`: Implicit Q-Learning adapted to an online setting, [[Kostrikov et al., 2022](https://openreview.net/forum?id=68n2s9ZJWF8)]

* `SACLite`: SAC without the entropy term on the critic, [[Yu et al., 2022](https://arxiv.org/abs/2201.12434)]

## Discrete Control

```commandline

main_disc.py --env  --policy 

```

Policies benchmarked in our paper were:

* `DoubleGum`: DoubleGum (our algorithm)

* `DQN`: DQN, [[Mnih et al., 2015](https://www.nature.com/articles/nature14236)]

* `DDQN`: DDQN (Double DQN), [[van Hasselt et al., 2016](https://ojs.aaai.org/index.php/AAAI/article/view/10295)]

* `DuellingDQN`: DuellingDQN, [[Wang et al., 2016](http://proceedings.mlr.press/v48/wangf16.html)]

Policies we created/modified as additional benchmarks were:

* `DuellingDDQN`: DuellingDDQN (Duelling Double DQN)

## Graphs and Tables

Reproduced using raw data from `Data` and `Logs`.

`Logs` (4.21 GB) contains data for Section 4 (Figures 1 and 2) and Appendix E.2 (Figures 6 and 7), while `Data` (5.4 MB) contains benchmark results for DoubleGum and baselines used in all other graphs, results and tables.

Ran by

```commandline

python plotting/fig.py

python tables/tab.py

```

## Acknowledgements

* Wrappers from [ikostrikov/jaxrl](https://github.com/ikostrikov/jaxrl)

* Distributional RL from [google-deepmind/acme](https://github.com/google-deepmind/acme)

* Control flow from [yifan12wu/td3-jax](https://github.com/yifan12wu/td3-jax)