{"id":13492910,"url":"https://github.com/rail-berkeley/softlearning","last_synced_at":"2025-04-08T09:05:57.306Z","repository":{"id":33816666,"uuid":"160139764","full_name":"rail-berkeley/softlearning","owner":"rail-berkeley","description":"Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.","archived":false,"fork":false,"pushed_at":"2023-11-29T14:16:25.000Z","size":13756,"stargazers_count":1280,"open_issues_count":53,"forks_count":245,"subscribers_count":36,"default_branch":"master","last_synced_at":"2025-04-01T07:43:30.691Z","etag":null,"topics":["deep-learning","deep-neural-networks","deep-reinforcement-learning","machine-learning","reinforcement-learning","soft-actor-critic"],"latest_commit_sha":null,"homepage":"https://sites.google.com/view/sac-and-applications","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rail-berkeley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-12-03T05:55:54.000Z","updated_at":"2025-04-01T05:28:11.000Z","dependencies_parsed_at":"2024-02-09T04:46:28.893Z","dependency_job_id":null,"html_url":"https://github.com/rail-berkeley/softlearning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Fsoftlearning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Fsoftlearning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Fsoftlearning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rail-berkeley%2Fsoftlearning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rail-berkeley","download_url":"https://codeload.github.com/rail-berkeley/softlearning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247809963,"owners_count":20999816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-neural-networks","deep-reinforcement-learning","machine-learning","reinforcement-learning","soft-actor-critic"],"created_at":"2024-07-31T19:01:10.364Z","updated_at":"2025-04-08T09:05:57.285Z","avatar_url":"https://github.com/rail-berkeley.png","language":"Python","funding_links":[],"categories":["Uncategorized","Libraries","Python","Models and Projects"],"sub_categories":["Uncategorized","Ray Tune (Hyperparameter Optimization)"],"readme":"# Softlearning\n\nSoftlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is fairly thin and primarily optimized for our own development purposes. It utilizes the tf.keras modules for most of the model classes (e.g. policies and value functions). We use Ray for the experiment orchestration. Ray Tune and Autoscaler implement several neat features that enable us to seamlessly run the same experiment scripts that we use for local prototyping to launch large-scale experiments on any chosen cloud service (e.g. GCP or AWS), and intelligently parallelize and distribute training for effective resource allocation.\n\nThis implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at [rlkit](https://github.com/vitchyr/rlkit).\n\n# Getting Started\n\n## Prerequisites\n\nThe environment can be run either locally using conda or inside a docker container. For conda installation, you need to have [Conda](https://conda.io/docs/user-guide/install/index.html) installed. For docker installation you will need to have [Docker](https://docs.docker.com/engine/installation/) and [Docker Compose](https://docs.docker.com/compose/install/) installed. Also, most of our environments currently require a [MuJoCo](https://www.roboti.us/license.html) license.\n\n## Conda Installation\n\n1. [Download](https://www.roboti.us/index.html) and install MuJoCo 1.50 and 2.00 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (`~/.mujoco/mjpro150` and `~/.mujoco/mujoco200_{platform}`). Unfortunately, `gym` and `dm_control` expect different paths for MuJoCo 2.00 installation, which is why you will need to have it installed both in `~/.mujoco/mujoco200_{platform}` and `~/.mujoco/mujoco200`. The easiest way is to create a symlink from `~/.mujoco/mujoco200_{plaftorm}` -\u003e `~/.mujoco/mujoco200` with: `ln -s ~/.mujoco/mujoco200_{platform} ~/.mujoco/mujoco200`.\n\n2. Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:\n\n3. Clone `softlearning`\n```\ngit clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH}\n```\n\n4. Create and activate conda environment, install softlearning to enable command line interface.\n```\ncd ${SOFTLEARNING_PATH}\nconda env create -f environment.yml\nconda activate softlearning\npip install -e ${SOFTLEARNING_PATH}\n```\n\nThe environment should be ready to run. See examples section for examples of how to train and simulate the agents.\n\nFinally, to deactivate and remove the conda environment:\n```\nconda deactivate\nconda remove --name softlearning --all\n```\n\n## Docker Installation\n\n### docker-compose\nTo build the image and run the container:\n```\nexport MJKEY=\"$(cat ~/.mujoco/mjkey.txt)\" \\\n    \u0026\u0026 docker-compose \\\n        -f ./docker/docker-compose.dev.cpu.yml \\\n        up \\\n        -d \\\n        --force-recreate\n```\n\nYou can access the container with the typical Docker [exec](https://docs.docker.com/engine/reference/commandline/exec/)-command, i.e.\n\n```\ndocker exec -it softlearning bash\n```\n\nSee examples section for examples of how to train and simulate the agents.\n\nFinally, to clean up the docker setup:\n```\ndocker-compose \\\n    -f ./docker/docker-compose.dev.cpu.yml \\\n    down \\\n    --rmi all \\\n    --volumes\n```\n\n## Examples\n### Training and simulating an agent\n1. To train the agent\n```\nsoftlearning run_example_local examples.development \\\n    --algorithm SAC \\\n    --universe gym \\\n    --domain HalfCheetah \\\n    --task v3 \\\n    --exp-name my-sac-experiment-1 \\\n    --checkpoint-frequency 1000  # Save the checkpoint to resume training later\n```\n\n2. To simulate the resulting policy:\nFirst, find the *absolute* path that the checkpoint is saved to. By default (i.e. without specifying the `log-dir` argument to the previous script), the data is saved under `~/ray_results/\u003cuniverse\u003e/\u003cdomain\u003e/\u003ctask\u003e/\u003cdatatimestamp\u003e-\u003cexp-name\u003e/\u003ctrial-id\u003e/\u003ccheckpoint-id\u003e`. For example: `~/ray_results/gym/HalfCheetah/v3/2018-12-12T16-48-37-my-sac-experiment-1-0/mujoco-runner_0_seed=7585_2018-12-12_16-48-37xuadh9vd/checkpoint_1000/`. The next command assumes that this path is found from `${SAC_CHECKPOINT_DIR}` environment variable.\n\n```\npython -m examples.development.simulate_policy \\\n    ${SAC_CHECKPOINT_DIR} \\\n    --max-path-length 1000 \\\n    --num-rollouts 1 \\\n    --render-kwargs '{\"mode\": \"human\"}'\n```\n\n`examples.development.main` contains several different environments and there are more example scripts available in the  `/examples` folder. For more information about the agents and configurations, run the scripts with `--help` flag: `python ./examples/development/main.py --help`\n```\noptional arguments:\n  -h, --help            show this help message and exit\n  --universe {robosuite,dm_control,gym}\n  --domain DOMAIN\n  --task TASK\n  --checkpoint-replay-pool CHECKPOINT_REPLAY_POOL\n                        Whether a checkpoint should also saved the replay\n                        pool. If set, takes precedence over\n                        variant['run_params']['checkpoint_replay_pool']. Note\n                        that the replay pool is saved (and constructed) piece\n                        by piece so that each experience is saved only once.\n  --algorithm ALGORITHM\n  --policy {gaussian}\n  --exp-name EXP_NAME\n  --mode MODE\n  --run-eagerly RUN_EAGERLY\n                        Whether to run tensorflow in eager mode.\n  --local-dir LOCAL_DIR\n                        Destination local folder to save training results.\n  --confirm-remote [CONFIRM_REMOTE]\n                        Whether or not to query yes/no on remote run.\n  --video-save-frequency VIDEO_SAVE_FREQUENCY\n                        Save frequency for videos.\n  --cpus CPUS           Cpus to allocate to ray process. Passed to `ray.init`.\n  --gpus GPUS           Gpus to allocate to ray process. Passed to `ray.init`.\n  --resources RESOURCES\n                        Resources to allocate to ray process. Passed to\n                        `ray.init`.\n  --include-webui INCLUDE_WEBUI\n                        Boolean flag indicating whether to start theweb UI,\n                        which is a Jupyter notebook. Passed to `ray.init`.\n  --temp-dir TEMP_DIR   If provided, it will specify the root temporary\n                        directory for the Ray process. Passed to `ray.init`.\n  --resources-per-trial RESOURCES_PER_TRIAL\n                        Resources to allocate for each trial. Passed to\n                        `tune.run`.\n  --trial-cpus TRIAL_CPUS\n                        CPUs to allocate for each trial. Note: this is only\n                        used for Ray's internal scheduling bookkeeping, and is\n                        not an actual hard limit for CPUs. Passed to\n                        `tune.run`.\n  --trial-gpus TRIAL_GPUS\n                        GPUs to allocate for each trial. Note: this is only\n                        used for Ray's internal scheduling bookkeeping, and is\n                        not an actual hard limit for GPUs. Passed to\n                        `tune.run`.\n  --trial-extra-cpus TRIAL_EXTRA_CPUS\n                        Extra CPUs to reserve in case the trials need to\n                        launch additional Ray actors that use CPUs.\n  --trial-extra-gpus TRIAL_EXTRA_GPUS\n                        Extra GPUs to reserve in case the trials need to\n                        launch additional Ray actors that use GPUs.\n  --num-samples NUM_SAMPLES\n                        Number of times to repeat each trial. Passed to\n                        `tune.run`.\n  --upload-dir UPLOAD_DIR\n                        Optional URI to sync training results to (e.g.\n                        s3://\u003cbucket\u003e or gs://\u003cbucket\u003e). Passed to `tune.run`.\n  --trial-name-template TRIAL_NAME_TEMPLATE\n                        Optional string template for trial name. For example:\n                        '{trial.trial_id}-seed={trial.config[run_params][seed]\n                        }' Passed to `tune.run`.\n  --checkpoint-frequency CHECKPOINT_FREQUENCY\n                        How many training iterations between checkpoints. A\n                        value of 0 (default) disables checkpointing. If set,\n                        takes precedence over\n                        variant['run_params']['checkpoint_frequency']. Passed\n                        to `tune.run`.\n  --checkpoint-at-end CHECKPOINT_AT_END\n                        Whether to checkpoint at the end of the experiment. If\n                        set, takes precedence over\n                        variant['run_params']['checkpoint_at_end']. Passed to\n                        `tune.run`.\n  --max-failures MAX_FAILURES\n                        Try to recover a trial from its last checkpoint at\n                        least this many times. Only applies if checkpointing\n                        is enabled. Passed to `tune.run`.\n  --restore RESTORE     Path to checkpoint. Only makes sense to set if running\n                        1 trial. Defaults to None. Passed to `tune.run`.\n  --server-port SERVER_PORT\n                        Port number for launching TuneServer. Passed to\n                        `tune.run`.\n```\n\n### Resume training from a saved checkpoint\n\n## This feature is currently broken!\n\nIn order to resume training from previous checkpoint, run the original example main-script, with an additional `--restore` flag. For example, the previous example can be resumed as follows:\n\n```\nsoftlearning run_example_local examples.development \\\n    --algorithm SAC \\\n    --universe gym \\\n    --domain HalfCheetah \\\n    --task v3 \\\n    --exp-name my-sac-experiment-1 \\\n    --checkpoint-frequency 1000 \\\n    --restore ${SAC_CHECKPOINT_PATH}\n```\n\n# References\nThe algorithms are based on the following papers:\n\n*Soft Actor-Critic Algorithms and Applications*.\u003c/br\u003e\nTuomas Haarnoja*, Aurick Zhou*, Kristian Hartikainen*, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine.\narXiv preprint, 2018.\u003c/br\u003e\n[paper](https://arxiv.org/abs/1812.05905)  |  [videos](https://sites.google.com/view/sac-and-applications)\n\n*Latent Space Policies for Hierarchical Reinforcement Learning*.\u003c/br\u003e\nTuomas Haarnoja*, Kristian Hartikainen*, Pieter Abbeel, and Sergey Levine.\nInternational Conference on Machine Learning (ICML), 2018.\u003c/br\u003e\n[paper](https://arxiv.org/abs/1804.02808) | [videos](https://sites.google.com/view/latent-space-deep-rl)\n\n*Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor*.\u003c/br\u003e\nTuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine.\nInternational Conference on Machine Learning (ICML), 2018.\u003c/br\u003e\n[paper](https://arxiv.org/abs/1801.01290) | [videos](https://sites.google.com/view/soft-actor-critic)\n\n*Composable Deep Reinforcement Learning for Robotic Manipulation*.\u003c/br\u003e\nTuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine.\nInternational Conference on Robotics and Automation (ICRA), 2018.\u003c/br\u003e\n[paper](https://arxiv.org/abs/1803.06773) | [videos](https://sites.google.com/view/composing-real-world-policies)\n\n*Reinforcement Learning with Deep Energy-Based Policies*.\u003c/br\u003e\nTuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine.\nInternational Conference on Machine Learning (ICML), 2017.\u003c/br\u003e\n[paper](https://arxiv.org/abs/1702.08165) | [videos](https://sites.google.com/view/softqlearning/home)\n\nIf Softlearning helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:\n```\n@techreport{haarnoja2018sacapps,\n  title={Soft Actor-Critic Algorithms and Applications},\n  author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine},\n  journal={arXiv preprint arXiv:1812.05905},\n  year={2018}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frail-berkeley%2Fsoftlearning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frail-berkeley%2Fsoftlearning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frail-berkeley%2Fsoftlearning/lists"}