https://github.com/haarnoja/sac
Soft Actor-Critic
https://github.com/haarnoja/sac
Last synced: 8 months ago
JSON representation
Soft Actor-Critic
- Host: GitHub
- URL: https://github.com/haarnoja/sac
- Owner: haarnoja
- License: other
- Created: 2017-12-07T01:17:27.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-11-29T20:49:46.000Z (over 2 years ago)
- Last Synced: 2024-12-06T19:17:16.540Z (over 1 year ago)
- Language: Python
- Size: 604 KB
- Stars: 1,010
- Watchers: 29
- Forks: 236
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-deep-reinforcement-learning - haarnoja/sac
README
**This repository is no longer maintained. Please use our new [Softlearning](https://github.com/rail-berkeley/softlearning) package instead.**
# Soft Actor-Critic
Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290) presented at ICML 2018.
This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at [rlkit](https://github.com/vitchyr/rlkit) by [Vitchyr Pong](https://github.com/vitchyr).
See the [DIAYN documentation](./DIAYN.md) for using SAC for learning diverse skills.
# Getting Started
Soft Actor-Critic can be run either locally or through Docker.
## Prerequisites
You will need to have [Docker](https://docs.docker.com/engine/installation/) and [Docker Compose](https://docs.docker.com/compose/install/) installed unless you want to run the environment locally.
Most of the models require a [Mujoco](https://www.roboti.us/license.html) license.
## Docker installation
If you want to run the Mujoco environments, the docker environment needs to know where to find your Mujoco license key (`mjkey.txt`). You can either copy your key into `/.mujoco/mjkey.txt`, or you can specify the path to the key in your environment variables:
```
export MUJOCO_LICENSE_PATH=/mjkey.txt
```
Once that's done, you can run the Docker container with
```
docker-compose up
```
Docker compose creates a Docker container named `soft-actor-critic` and automatically sets the needed environment variables and volumes.
You can access the container with the typical Docker [exec](https://docs.docker.com/engine/reference/commandline/exec/)-command, i.e.
```
docker exec -it soft-actor-critic bash
```
See examples section for examples of how to train and simulate the agents.
To clean up the setup:
```
docker-compose down
```
## Local installation
To get the environment installed correctly, you will first need to clone [rllab](https://github.com/rll/rllab), and have its path added to your PYTHONPATH environment variable.
1. Clone rllab
```
cd
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}
```
2. [Download](https://www.roboti.us/index.html) and copy mujoco files to rllab path:
If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the `.dylib` files instead of `.so` files.
```
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir /rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so /rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 /rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp
```
3. Copy your Mujoco license key (mjkey.txt) to rllab path:
```
cp /mjkey.txt /rllab/vendor/mujoco
```
4. Clone sac
```
cd
git clone https://github.com/haarnoja/sac.git
cd sac
```
5. Create and activate conda environment
```
cd sac
conda env create -f environment.yml
source activate sac
```
The environment should be ready to run. See examples section for examples of how to train and simulate the agents.
Finally, to deactivate and remove the conda environment:
```
source deactivate
conda remove --name sac --all
```
## Examples
### Training and simulating an agent
1. To train the agent
```
python ./examples/mujoco_all_sac.py --env=swimmer --log_dir="/root/sac/data/swimmer-experiment"
```
2. To simulate the agent (*NOTE*: This step currently fails with the Docker installation, due to missing display.)
```
python ./scripts/sim_policy.py /root/sac/data/swimmer-experiment/itr_.pkl
```
`mujoco_all_sac.py` contains several different environments and there are more example scripts available in the `/examples` folder. For more information about the agents and configurations, run the scripts with `--help` flag. For example:
```
python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
```
`mujoco_all_sac.py` contains several different environments and there are more example scripts available in the `/examples` folder. For more information about the agents and configurations, run the scripts with `--help` flag. For example:
```
python ./examples/mujoco_all_sac.py --help
usage: mujoco_all_sac.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
```
# Benchmark Results
Benchmark results for some of the OpenAI Gym v2 environments can be found [here](https://drive.google.com/open?id=1I0NUrAzU7wwJQiX_MSmr1LvshjDZ4gSh).
# Credits
The soft actor-critic algorithm was developed by Tuomas Haarnoja under the supervision of Prof. [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/) and Prof. [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/) at UC Berkeley. Special thanks to [Vitchyr Pong](https://github.com/vitchyr), who wrote some parts of the code, and [Kristian Hartikainen](https://github.com/hartikainen) who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by [Berkeley Deep Drive](https://deepdrive.berkeley.edu/).
# Reference
```
@article{haarnoja2017soft,
title={Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor},
author={Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey},
booktitle={Deep Reinforcement Learning Symposium},
year={2017}
}
```