{"id":20014279,"url":"https://github.com/dyth/doublegum","last_synced_at":"2025-05-04T22:31:04.796Z","repository":{"id":198681246,"uuid":"701308553","full_name":"dyth/doublegum","owner":"dyth","description":"NeurIPS 2023 Spotlight","archived":false,"fork":false,"pushed_at":"2024-04-27T20:17:14.000Z","size":101,"stargazers_count":9,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-28T21:25:19.479Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://openreview.net/forum?id=UdaTyy0BNB","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dyth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-06T11:19:56.000Z","updated_at":"2024-04-28T21:25:19.480Z","dependencies_parsed_at":null,"dependency_job_id":"b7f4e090-0c58-41ca-96b5-f1a5ad8e944e","html_url":"https://github.com/dyth/doublegum","commit_stats":null,"previous_names":["dyth/doublegum"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyth%2Fdoublegum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyth%2Fdoublegum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyth%2Fdoublegum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyth%2Fdoublegum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dyth","download_url":"https://codeload.github.com/dyth/doublegum/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224413317,"owners_count":17306864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T07:39:53.983Z","updated_at":"2024-11-13T07:39:54.458Z","avatar_url":"https://github.com/dyth.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DoubleGum\n\nCode for *Double Gumbel Q-Learning*\n\nData (5.4 MB): [https://drive.google.com/file/d/12wyYZ92bvVdkEQIHms8mVR5zYJZue-cd/view?usp=sharing](https://drive.google.com/file/d/12wyYZ92bvVdkEQIHms8mVR5zYJZue-cd/view?usp=sharing)\n\nLogs (4.21 GB): [https://drive.google.com/file/d/1LpR3lrKUx-qTaCrI4YViAjc0QA5kb8P2/view?usp=sharing](https://drive.google.com/file/d/1LpR3lrKUx-qTaCrI4YViAjc0QA5kb8P2/view?usp=sharing)\n\n\n## Installation\n\nOn `Python 3.9` with `Cuda 12.2.1` and `cudnn 8.8.0`.\n\n```commandline\ngit clone git@github.com:dyth/doublegum.git\ncd doublegum\n```\n\ncreate virtualenv\n```\nvirtualenv \u003cVIRTUALENV_LOCATION\u003e/doublegum\nsource \u003cVIRTUALENV_LOCATION\u003e/doublegum\n```\nor conda\n```commandline\nconda create --name doublegum python=3.9\nconda activate doublegum\n```\n\ninstall mujoco\n```commandline\nmkdir .mujoco\ncd .mujoco\nwget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz\ntar -xf mujoco210-linux-x86_64.tar.gz\n```\n\ninstall packages\n```commandline\npip install -r requirements.txt\npip install \"jax[cuda12_pip]==0.4.14\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n```\n\ntest that the code runs\n```commandline\n./test.sh\n```\n\n\n## Continuous Control\n\n```commandline\nmain_cont.py --env \u003cENV_NAME\u003e --policy \u003cPOLICY\u003e\n```\nMetaWorld `env`s are run with `--env MetaWorld_\u003cENVNAME\u003e`\n\nPolicies benchmarked in our paper were:\n* `DoubleGum`: DoubleGum (our algorithm)\n* `DDPG`: DDPG (Deep Deterministic Policy Gradients), [[Lilicrap et al., 2015](https://arxiv.org/abs/1509.02971)]\n* `TD3`: TD3 (Twin Delayed DDPG), [[Fujimoto et al., 2018](https://proceedings.mlr.press/v80/fujimoto18a.html)]\n* `SAC`: SAC (Soft Actor Critic, defaults to use Twin Critics), [[Haarnoja et al., 2018](https://arxiv.org/abs/1812.05905)]\n* `XQL --ensemble 1`: XQL (Extreme Q-Learning), [[Garg et al., 2023](https://openreview.net/forum?id=SJ0Lde3tRL)]\n* `MoG-DDPG`: MoG-DDPG (Mixture of Gaussians Critics DDPG), [[Barth-Maron et al., 2018](https://openreview.net/forum?id=SyZipzbCb), [Shariari et al, 2022](https://arxiv.org/abs/2204.10256)]\n\nPolicies we created/modified as additional benchmarks were:\n* `QR-DDPG`: QR-DDPG (Quantile Regression [[Dabney et al., 2018](https://ojs.aaai.org/index.php/AAAI/article/view/11791)] with DDPG, defaults to use Twin Critics)\n* `QR-DDPG --ensemble 1`: QR-DDPG without Twin Critics\n* `SAC --ensemble 1`: SAC without Twin Critics\n* `XQL`: XQL with Twin Critics\n* `TD3 --ensemble 5 --pessimism \u003cp\u003e`: Finer TD3, where p is an integer between 0 and 4\n\nPolicies included in this repository but not benchmarked in our paper were:\n* `IQL`: Implicit Q-Learning adapted to an online setting, [[Kostrikov et al., 2022](https://openreview.net/forum?id=68n2s9ZJWF8)]\n* `SACLite`: SAC without the entropy term on the critic, [[Yu et al., 2022](https://arxiv.org/abs/2201.12434)]\n\n\n## Discrete Control\n\n```commandline\nmain_disc.py --env \u003cENV_NAME\u003e --policy \u003cPOLICY\u003e\n```\n\nPolicies benchmarked in our paper were:\n* `DoubleGum`: DoubleGum (our algorithm)\n* `DQN`: DQN, [[Mnih et al., 2015](https://www.nature.com/articles/nature14236)]\n* `DDQN`: DDQN (Double DQN), [[van Hasselt et al., 2016](https://ojs.aaai.org/index.php/AAAI/article/view/10295)]\n* `DuellingDQN`: DuellingDQN, [[Wang et al., 2016](http://proceedings.mlr.press/v48/wangf16.html)]\n\nPolicies we created/modified as additional benchmarks were:\n* `DuellingDDQN`: DuellingDDQN (Duelling Double DQN)\n\n\n## Graphs and Tables\n\nReproduced using raw data from `Data` and `Logs`.\n`Logs` (4.21 GB) contains data for Section 4 (Figures 1 and 2) and Appendix E.2 (Figures 6 and 7), while `Data` (5.4 MB) contains benchmark results for DoubleGum and baselines used in all other graphs, results and tables.\n\nRan by\n```commandline\npython plotting/fig\u003cx\u003e.py\npython tables/tab\u003cx\u003e.py\n```\n\n\n## Acknowledgements\n\n* Wrappers from [ikostrikov/jaxrl](https://github.com/ikostrikov/jaxrl)\n* Distributional RL from [google-deepmind/acme](https://github.com/google-deepmind/acme)\n* Control flow from [yifan12wu/td3-jax](https://github.com/yifan12wu/td3-jax)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyth%2Fdoublegum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdyth%2Fdoublegum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyth%2Fdoublegum/lists"}