{"id":18753511,"url":"https://github.com/lucadellalib/sac-beta","last_synced_at":"2026-05-07T03:39:13.912Z","repository":{"id":255955759,"uuid":"622700834","full_name":"lucadellalib/sac-beta","owner":"lucadellalib","description":"Soft actor-critic with beta policy via implicit reparameterization gradients","archived":false,"fork":false,"pushed_at":"2024-09-08T03:48:56.000Z","size":146,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-29T00:53:41.470Z","etag":null,"topics":["beta-distribution","deep-reinforcement-learning","gymnasium","mujoco","pytorch","reinforcement-learning","soft-actor-critic","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucadellalib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-02T21:37:47.000Z","updated_at":"2024-12-06T22:22:09.000Z","dependencies_parsed_at":"2024-09-08T04:39:03.688Z","dependency_job_id":"e930d808-7528-4863-98df-251efadda703","html_url":"https://github.com/lucadellalib/sac-beta","commit_stats":null,"previous_names":["lucadellalib/sac-beta"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fsac-beta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fsac-beta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fsac-beta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Fsac-beta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucadellalib","download_url":"https://codeload.github.com/lucadellalib/sac-beta/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239644163,"owners_count":19673578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beta-distribution","deep-reinforcement-learning","gymnasium","mujoco","pytorch","reinforcement-learning","soft-actor-critic","tensorflow"],"created_at":"2024-11-07T17:26:05.235Z","updated_at":"2025-11-28T10:30:18.230Z","avatar_url":"https://github.com/lucadellalib.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients\n\nThis project investigates the use of [soft actor-critic (SAC)](https://arxiv.org/abs/1801.01290v2) with the beta\npolicy, which, compared to the normal policy, does not suffer from boundary effect bias and [has been shown to\nconverge faster](https://proceedings.mlr.press/v70/chou17a.html). Implicit reparameterization approaches based\non [automatic differentiation](https://arxiv.org/abs/1805.08498v4) and [optimal mass transport](https://arxiv.org/abs/1806.01851v2)\nare used to draw samples from the policy in a differentiable manner, as required by SAC. For the experimental\nevaluation we use four [MuJoCo](https://gymnasium.farama.org/environments/mujoco/) continuous control tasks.\n\n---------------------------------------------------------------------------------------------------------\n\n## 🛠️️ Installation\n\nFirst of all, install [Miniconda](https://docs.conda.io/en/latest/miniconda.html).\nClone or download and extract the repository, navigate to `\u003cpath-to-repository\u003e`, open a terminal and run:\n\n```bash\nconda env create -f environment.yml\n```\n\nProject dependencies (pinned to a specific version to reduce compatibility and reproducibility issues)\nwill be installed in a [Conda](https://www.anaconda.com/) virtual environment named `sac-beta`.\n\nTo activate it, run:\n\n```bash\nconda activate sac-beta\n```\n\nTo deactivate it, run:\n\n```bash\nconda deactivate\n```\n\nTo permanently delete it, run:\n\n```bash\nconda remove --n sac-beta --all\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## ▶️ Quickstart\n\n### Running an experiment\n\nTo train one of the available algorithms on a MuJoCo task, open a terminal in `scripts` and run:\n\n```bash\nconda activate sac-beta\npython \u003calgorithm\u003e.py --task \u003ctask\u003e\n```\n\nLogs and experimental results (metrics, checkpoints, etc.) can be found in the auto-generated `logs`\nand `experiments` directory, respectively.\n\n### Reproducing the experimental results\n\nThe experiments were run on a CentOS Linux 7 machine with an Intel Xeon Gold 6148 Skylake CPU with 20 cores\n@ 2.40 GHz, 32 GB RAM and an NVIDIA Tesla V100 SXM2 @ 16GB with CUDA Toolkit 11.4.2.\n\n#### Performance comparison\n\n**NOTE**: `run_experiment.py` starts several processes in parallel under the hood, one for each experiment\n(make sure to have enough RAM and/or GPU memory, or adapt the script to your needs).\n\nTo reproduce the experimental results, open a terminal and run:\n\n```bash\nconda activate sac-beta\n\npython run_experiment.py sac_beta_ad Ant-v4\npython run_experiment.py sac_beta_omt Ant-v4\npython run_experiment.py sac_normal Ant-v4\npython run_experiment.py sac_tanh_normal Ant-v4\n\npython run_experiment.py sac_beta_ad HalfCheetah-v4\npython run_experiment.py sac_beta_omt HalfCheetah-v4\npython run_experiment.py sac_normal HalfCheetah-v4\npython run_experiment.py sac_tanh_normal HalfCheetah-v4\n\npython run_experiment.py sac_beta_ad Hopper-v4\npython run_experiment.py sac_beta_omt Hopper-v4\npython run_experiment.py sac_normal Hopper-v4\npython run_experiment.py sac_tanh_normal Hopper-v4\n\npython run_experiment.py sac_beta_ad Walker2d-v4\npython run_experiment.py sac_beta_omt Walker2d-v4\npython run_experiment.py sac_normal Walker2d-v4\npython run_experiment.py sac_tanh_normal Walker2d-v4\n```\n\nWait for the experiments to finish. To plot the results, open a terminal and run:\n\n```bash\npython plotter.py --root-dir ../experiments/Ant-v4 --smooth 1 --shaded-std --legend-pattern \"^([\\w-]+)\" --title Ant-v4 -u --output-path Ant-v4.pdf\npython plotter.py --root-dir ../experiments/HalfCheetah-v4 --smooth 1 --shaded-std --legend-pattern \"$^\" --title HalfCheetah-v4 --ylabel \"\" -u --output-path HalfCheetah-v4.pdf\npython plotter.py --root-dir ../experiments/Hopper-v4 --smooth 1 --shaded-std --legend-pattern \"$^\" --title Hopper-v4 --ylabel \"\" -u --output-path Hopper-v4.pdf\npython plotter.py --root-dir ../experiments/Walker2d-v4 --smooth 1 --shaded-std --legend-pattern \"$^\" --title Walker2d-v4 --ylabel \"\" -u --output-path Walker2d-v4.pdf\n```\n\n#### Ablation study\n\n**NOTE**: `run_experiment.py` starts several processes in parallel under the hood, one for each experiment\n(make sure to have enough RAM and/or GPU memory, or adapt the script to your needs).\n\nTo reproduce the experimental results, open a terminal and run:\n\n```bash\nconda activate sac-beta\n\npython run_experiment.py sac_beta_omt Ant-v4 --experiment-dir ../experiments/ablation\npython run_experiment.py sac_beta_omt_no_clip Ant-v4 --experiment-dir ../experiments/ablation\npython run_experiment.py sac_beta_omt_non_concave Ant-v4 --experiment-dir ../experiments/ablation\npython run_experiment.py sac_beta_omt_softplus Ant-v4 --experiment-dir ../experiments/ablation\n```\n\nWait for the experiments to finish. To plot the results, open a terminal and run:\n\n```bash\npython plotter.py --root-dir ../experiments/ablation/Ant-v4 --smooth 1 --shaded-std --legend-pattern \"^([\\w-]+)\" --title Ant-v4 --fig-length 5 --fig-width 3 -u --output-path ablation.pdf\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## 📧 Contact\n\n[luca.dellalib@gmail.com](mailto:luca.dellalib@gmail.com)\n\n---------------------------------------------------------------------------------------------------------\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucadellalib%2Fsac-beta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucadellalib%2Fsac-beta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucadellalib%2Fsac-beta/lists"}