{"id":13612717,"url":"https://github.com/tensorlayer/RLzoo","last_synced_at":"2025-04-13T12:32:46.803Z","repository":{"id":35970230,"uuid":"198997104","full_name":"tensorlayer/RLzoo","owner":"tensorlayer","description":"A Comprehensive Reinforcement Learning Zoo for Simple Usage 🚀","archived":false,"fork":false,"pushed_at":"2023-03-24T22:35:03.000Z","size":21733,"stargazers_count":635,"open_issues_count":8,"forks_count":96,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-04-09T07:04:31.082Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning","mindspore","paddepaddle","reinforcement-learning","reinforcement-learning-practices","tensorflow","tensorlayer"],"latest_commit_sha":null,"homepage":"http://rlzoo.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tensorlayer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-07-26T10:23:27.000Z","updated_at":"2025-03-21T16:08:39.000Z","dependencies_parsed_at":"2022-07-14T06:00:30.821Z","dependency_job_id":"cda444eb-4f83-40bf-af69-93185fc5e414","html_url":"https://github.com/tensorlayer/RLzoo","commit_stats":{"total_commits":314,"total_committers":9,"mean_commits":"34.888888888888886","dds":"0.42356687898089174","last_synced_commit":"e3ed8a57bd8130bd7b663f213a388ce972925f30"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorlayer%2FRLzoo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorlayer%2FRLzoo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorlayer%2FRLzoo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorlayer%2FRLzoo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tensorlayer","download_url":"https://codeload.github.com/tensorlayer/RLzoo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248714725,"owners_count":21149953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning","mindspore","paddepaddle","reinforcement-learning","reinforcement-learning-practices","tensorflow","tensorlayer"],"created_at":"2024-08-01T20:00:33.520Z","updated_at":"2025-04-13T12:32:41.784Z","avatar_url":"https://github.com/tensorlayer.png","language":"Python","readme":"# Reinforcement Learning Zoo\n[![Documentation Status](https://readthedocs.org/projects/rlzoo/badge/?version=latest)](https://rlzoo.readthedocs.io/en/latest/?badge=latest)\n[![Supported TF Version](https://img.shields.io/badge/TensorFlow-2.0.0%2B-brightgreen.svg)](https://github.com/tensorflow/tensorflow/releases)\n[![Downloads](http://pepy.tech/badge/rlzoo)](http://pepy.tech/project/rlzoo)\n\n\u003cbr/\u003e\n\u003ca href=\"https://deepreinforcementlearningbook.org\" target=\"\\_blank\"\u003e\n\t\u003cdiv align=\"center\"\u003e\n\t\t\u003cimg src=\"docs/img/rlzoo-logo.png\" width=\"40%\"/\u003e\n\t\u003c/div\u003e\n\u003c!-- \t\u003cdiv align=\"center\"\u003e\u003ccaption\u003eSlack Invitation Link\u003c/caption\u003e\u003c/div\u003e --\u003e\n\u003c/a\u003e\n\u003cbr/\u003e\n\nRLzoo is a collection of the most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0 and API of neural network layers in [**TensorLayer2.0+**](https://github.com/tensorlayer/tensorlayer), to provide a hands-on fast-developing approach for reinforcement learning practices and benchmarks. It supports basic toy-tests like [OpenAI Gym](https://gym.openai.com/) and [DeepMind Control Suite](https://github.com/deepmind/dm_control) with very simple configurations. Moreover, RLzoo supports robot learning benchmark environment [RLBench](https://github.com/stepjam/RLBench) based on  [Vrep](http://www.coppeliarobotics.com/)/[Pyrep](https://github.com/stepjam/PyRep) simulator. Other large-scale distributed training framework for more realistic scenarios with [Unity 3D](https://github.com/Unity-Technologies/ml-agents), \n[Mujoco](http://www.mujoco.org/), [Bullet Physics](https://github.com/bulletphysics/bullet3), etc, will be supported in the future. A [Springer textbook](https://deepreinforcementlearningbook.org) is also provided, you can get the free PDF if your institute has Springer license.\n\nDifferent from RLzoo for simple usage with **high-level APIs**, we also have a [RL tutorial](https://github.com/tensorlayer/tensorlayer/tree/master/examples/reinforcement_learning) that aims to make the reinforcement learning tutorial simple, transparent and straight-forward with **low-level APIs**, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly.\n\n\u003c!-- \u003cem\u003eGym: Atari\u003c/em\u003e    \u003cem\u003eGym: Box2D \u003c/em\u003e   \u003cem\u003eGym: Classic Control \u003c/em\u003e  \u003cem\u003eGym: MuJoCo \u003c/em\u003e--\u003e\n\n\u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/atari.gif\" height=250 width=205 \u003e \u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/box2d.gif\" height=250 width=205 \u003e\u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/classic.gif\" height=250 width=205 \u003e \u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/mujoco.gif\" height=250 width=205 \u003e\n\n\u003c!-- \u003cem\u003eGym: Robotics\u003c/em\u003e    \u003cem\u003eDeepMind Control Suite \u003c/em\u003e   \u003cem\u003eGym: RLBench \u003c/em\u003e  --\u003e\n\n\u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/robotics.gif\" height=250 width=205 \u003e \u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/dmcontrol.gif\" height=250 width=205 \u003e \u003cimg src=\"https://github.com/tensorlayer/RLzoo/blob/master/gif/rlbench.gif\" height=250 width=205 \u003e \n\u003cimg src=\"https://github.com/tensorlayer/tensorlayer/blob/master/img/tl_transparent_logo.png\" height=180 width=200 \u003e\n\n\nPlease check our [**Online Documentation**](https://rlzoo.readthedocs.io) for detailed usage and the [**Arxiv paper**](https://arxiv.org/abs/2009.08644) for the high-level description of design choices plus comparisons with other RL libraries. We suggest users to report bugs using Github issues. Users can also discuss how to use RLzoo in the following slack channel.\n\n\u003cbr/\u003e\n\n\u003ca href=\"https://join.slack.com/t/tensorlayer/shared_invite/enQtODk1NTQ5NTY1OTM5LTQyMGZhN2UzZDBhM2I3YjYzZDBkNGExYzcyZDNmOGQzNmYzNjc3ZjE3MzhiMjlkMmNiMmM3Nzc4ZDY2YmNkMTY\" target=\"\\_blank\"\u003e\n\t\u003cdiv align=\"center\"\u003e\n\t\t\u003cimg src=\"https://github.com/tensorlayer/tensorlayer/raw/master/img/join_slack.png\" width=\"40%\"/\u003e\n\t\u003c/div\u003e\n\u003c/a\u003e\n\n\u003cbr/\u003e\n\n[*News*] RLzoo's paper is accepted at ACM Multimedia 2021 Open Source Software Competition! See a simple [presentation slide](https://github.com/tensorlayer/RLzoo/blob/master/gif/ACM_MM2021_Presentation_Slide.pdf) describing the key characteristics of RLzoo.\n\n\n**Table of contents:**\n\n- [Status](#status)\n- [Installation](#installation)\n- [Prerequisites](#prerequisites)\n- [Usage](#usage)\n- [Contents](#contents)\n  - [Algorithms](#algorithms)\n  - [Environments](#environments)\n  - [Configurations](#configurations)\n- [Properties](#properties)\n- [Troubleshooting](#troubleshooting)\n- [Credits](#credits)\n- [Citing](#citing)\n\n\n## Status: Release\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eCurrent status\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\nWe are currently open to any suggestions or pull requests from the community to make RLzoo a better repository. Given the scope of this project, we expect there could be some issues over\nthe coming months after initial release. We will keep improving the potential problems and commit when significant changes are made in the future. Current default hyperparameters for each algorithm and each environment may not be optimal, so you can play around with those hyperparameters to achieve best performances. We will release a version with optimal hyperparameters and benchmark results for all algorithms in the future.\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eVersion History\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\t\n* 1.0.4 (Current version)\n\n  Changes:\n\n  * Add distributed training for DPPO algorithm, using Kungfu\n\n* 1.0.3 \n\n  Changes:\n\n  * Fix bugs in SAC algorithm\n\n* 1.0.1 \n\n\tChanges: \n\t* Add [interactive training configuration](https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/interactive/main.ipynb);\n\t* Better support RLBench environment, with multi-head network architectures to support dictionary as observation type;\n\t* Make the code cleaner.\n* 0.0.1\n\u003c/div\u003e\n\u003c/details\u003e\n\n## Installation\nEnsure that you have **Python \u003e=3.5** (Python 3.6 is needed if using DeepMind Control Suite).\n\nDirect installation: \n```\npip3 install rlzoo --upgrade\n```\n\nInstall RLzoo from Git:\n```\ngit clone https://github.com/tensorlayer/RLzoo.git\ncd RLzoo\npip3 install .\n```\n\n## Prerequisites\n```pip3 install -r requirements.txt```\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eList of prerequisites.\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n* tensorflow \u003e= 2.0.0 or tensorflow-gpu \u003e= 2.0.0a0\n* tensorlayer \u003e= 2.0.1\n* tensorflow-probability\n* tf-nightly-2.0-preview\n* [Mujoco 2.0](http://www.mujoco.org/), [dm_control](https://github.com/deepmind/dm_control), [dm2gym](https://github.com/zuoxingdong/dm2gym) (if using DeepMind Control Suite environments)\n* Vrep, PyRep, RLBench (if using RLBench environments, follows [here](http://www.coppeliarobotics.com/downloads.html), [here](https://github.com/stepjam/PyRep) and [here](https://github.com/stepjam/RLBench))\n\u003c/div\u003e\n\u003c/details\u003e\n\n## Usage\n\nFor detailed usage, please check our [**online documentation**](https://rlzoo.readthedocs.io).\n\n### Quick Start\nChoose whatever environments with whatever RL algorithms supported in RLzoo, and enjoy the game by running following example in the root file of installed package:\n```python\n# in the root folder of RLzoo package\ncd rlzoo\npython run_rlzoo.py\n```\n\nWhat's in `run_rlzoo.py`?\n\n```python\nfrom rlzoo.common.env_wrappers import build_env\nfrom rlzoo.common.utils import call_default_params\nfrom rlzoo.algorithms import TD3  # import the algorithm to use\n# choose an algorithm\nAlgName = 'TD3'\n# chose an environment\nEnvName = 'Pendulum-v0'  \n# select a corresponding environment type\nEnvType = 'classic_control'\n# build an environment with wrappers\nenv = build_env(EnvName, EnvType)  \n# call default parameters for the algorithm and learning process\nalg_params, learn_params = call_default_params(env, EnvType, AlgName)  \n# instantiate the algorithm\nalg = eval(AlgName+'(**alg_params)')\n# start the training\nalg.learn(env=env, mode='train', render=False, **learn_params)  \n# test after training \nalg.learn(env=env, mode='test', render=True, **learn_params)  \n```\n\nThe main script `run_rlzoo.py` follows (almost) the same structure for all algorithms on all environments, see the [**full list of examples**](./examples.md).\n\n**General Descriptions:**\nRLzoo provides at least two types of interfaces for running the learning algorithms, with (1) implicit configurations or (2) explicit configurations. Both of them start learning program through running a python script, instead of running a long command line with all configurations shortened to be arguments of it (e.g. in Openai Baseline). Our approaches are found to be more interpretable, flexible and convenient to apply in practice. According to the level of explicitness of learning configurations, we provided two different ways of setting learning configurations in python scripts: the first one with implicit configurations uses a `default.py` script to record all configurations for each algorithm, while the second one with explicit configurations exposes all configurations to the running scripts. Both of them can run any RL algorithms on any environments supported in our repository with a simple command line.\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003e1. Implicit Configurations\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\nRLzoo with **implicit configurations** means the configurations for learning are not explicitly contained in the main script for running (i.e. `run_rlzoo.py`), but in the `default.py` file in each algorithm folder (for example, `rlzoo/algorithms/sac/default.py` is the default parameters configuration for SAC algorithm). All configurations include (1) parameter values for the algorithm and learning process, (2) the network structures, (3) the optimizers, etc, are divided into configurations for the algorithm (stored in `alg_params`) and configurations for the learning process (stored in `learn_params`). Whenever you want to change the configurations for the algorithm or learning process, you can either go to the folder of each algorithm and modify parameters in `default.py`, or change the values in `alg_params` (a dictionary of configurations for the algorithm) and `learn_params` (a dictionary of configurations for the learning process) in `run_rlzoo.py` according to the keys. \n\n#### Common Interface:\n\n```python\nfrom rlzoo.common.env_wrappers import build_env\nfrom rlzoo.common.utils import call_default_params\nfrom rlzoo.algorithms import *\n# choose an algorithm\nAlgName = 'TD3'\n# chose an environment\nEnvName = 'Pendulum-v0'  \n# select a corresponding environment type\nEnvType = ['classic_control', 'atari', 'box2d', 'mujoco', 'robotics', 'dm_control', 'rlbench'][0] \n# build an environment with wrappers\nenv = build_env(EnvName, EnvType)  \n# call default parameters for the algorithm and learning process\nalg_params, learn_params = call_default_params(env, EnvType, AlgName)  \n# instantiate the algorithm\nalg = eval(AlgName+'(**alg_params)')\n# start the training\nalg.learn(env=env, mode='train', render=False, **learn_params)  \n# test after training \nalg.learn(env=env, mode='test', render=True, **learn_params)  \n```\n\n\n```python\n# in the root folder of rlzoo package\ncd rlzoo\npython run_rlzoo.py\n```\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003e2. Explicit Configurations\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\nRLzoo with **explicit configurations** means the configurations for learning, including parameter values for the algorithm and the learning process, the network structures used in the algorithms and the optimizers etc, are explicitly displayed in the main script for running. And the main scripts for demonstration are under the folder of each algorithm, for example, `./rlzoo/algorithms/sac/run_sac.py` can be called with `python algorithms/sac/run_sac.py` from the file `./rlzoo` to run the learning process same as in above implicit configurations.\n\n#### A Quick Example\n\n```python\nimport gym\nfrom rlzoo.common.utils import make_env, set_seed\nfrom rlzoo.algorithms import AC\nfrom rlzoo.common.value_networks import ValueNetwork\nfrom rlzoo.common.policy_networks import StochasticPolicyNetwork\n\n''' load environment '''\nenv = gym.make('CartPole-v0').unwrapped\nobs_space = env.observation_space\nact_space = env.action_space\n# reproducible\nseed = 2\nset_seed(seed, env)\n\n''' build networks for the algorithm '''\nnum_hidden_layer = 4 #number of hidden layers for the networks\nhidden_dim = 64 # dimension of hidden layers for the networks\nwith tf.name_scope('AC'):\n        with tf.name_scope('Critic'):\n            \t# choose the critic network, can be replaced with customized network\n                critic = ValueNetwork(obs_space, hidden_dim_list=num_hidden_layer * [hidden_dim])\n        with tf.name_scope('Actor'):\n            \t# choose the actor network, can be replaced with customized network\n                actor = StochasticPolicyNetwork(obs_space, act_space, hidden_dim_list=num_hidden_layer * [hidden_dim], output_activation=tf.nn.tanh)\nnet_list = [actor, critic] # list of the networks\n\n''' choose optimizers '''\na_lr, c_lr = 1e-4, 1e-2  # a_lr: learning rate of the actor; c_lr: learning rate of the critic\na_optimizer = tf.optimizers.Adam(a_lr)\nc_optimizer = tf.optimizers.Adam(c_lr)\noptimizers_list=[a_optimizer, c_optimizer]  # list of optimizers\n\n# intialize the algorithm model, with algorithm parameters passed in\nmodel = AC(net_list, optimizers_list)\n''' \nfull list of arguments for the algorithm\n----------------------------------------\nnet_list: a list of networks (value and policy) used in the algorithm, from common functions or customization\noptimizers_list: a list of optimizers for all networks and differentiable variables\ngamma: discounted factor of reward\naction_range: scale of action values\n'''\n\n# start the training process, with learning parameters passed in\nmodel.learn(env, train_episodes=500,  max_steps=200,\n            save_interval=50, mode='train', render=False)\n''' \nfull list of parameters for training\n---------------------------------------\nenv: learning environment\ntrain_episodes:  total number of episodes for training\ntest_episodes:  total number of episodes for testing\nmax_steps:  maximum number of steps for one episode\nsave_interval: time steps for saving the weights and plotting the results\nmode: 'train' or 'test'\nrender:  if true, visualize the environment\n'''\n\n# test after training\nmodel.learn(env, test_episodes=100, max_steps=200,  mode='test', render=True)\n```\n\nIn the package folder, we provides examples with explicit configurations for each algorithm. \n\n```python\n# in the root folder of rlzoo package\ncd rlzoo\npython algorithms/\u003cALGORITHM_NAME\u003e/run_\u003cALGORITHM_NAME\u003e.py \n# for example: run actor-critic\npython algorithms/ac/run_ac.py\n```\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n### Interactive Configurations\nWe also provide an interactive learning configuration with Jupyter Notebook and *ipywidgets*, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with [`rlzoo/interactive/main.ipynb`](https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/interactive/main.ipynb) by running `$ jupyter notebook` to open it.\n\n![Interactive Video](https://github.com/tensorlayer/RLzoo/blob/master/gif/interactive.gif)\n\t\n\t\n### Distributed Training\nRLzoo supports distributed training frameworks across multiple computational nodes with multiple CPUs/GPUs, using the [Kungfu](https://github.com/lsds/KungFu) package. The installation of Kungfu requires to install *CMake* and *Golang* first, details see the [website of Kungfu](https://github.com/lsds/KungFu).\nAn example for distributed training is contained in folder `rlzoo/distributed`, by running the following command, you will launch the distributed training process: \n```bash\nrlzoo/distributed/run_dis_train.sh\n```\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eCode in Bash script\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\t\n```bash\n#!/bin/sh\nset -e\n\ncd $(dirname $0)\n\nkungfu_flags() {\n    echo -q\n    echo -logdir logs\n\n    local ip1=127.0.0.1\n    local np1=$np\n\n    local ip2=127.0.0.10\n    local np2=$np\n    local H=$ip1:$np1,$ip2:$np2\n    local m=cpu,gpu\n\n    echo -H $ip1:$np1\n}\n\nprun() {\n    local np=$1\n    shift\n    kungfu-run $(kungfu_flags) -np $np $@\n}\n\nn_learner=2\nn_actor=2\nn_server=1\n\nflags() {\n    echo -l $n_learner\n    echo -a $n_actor\n    echo -s $n_server\n}\n\nrl_run() {\n    local n=$((n_learner + n_actor + n_server))\n    prun $n python3 training_components.py $(flags)\n}\n\nmain() {\n    rl_run\n}\n\nmain\n```\nThe script specifies the ip addresses for different computational nodes, as well as the number of policy learners (updating the models), actors (sampling through interaction with environments) and inference servers (policy forward inference during sampling process) as `n_learner`, `n_actor` and `n_server` respectively. `n_server` can only be 1 at current version.\n\t\n\u003c/div\u003e\n\u003c/details\u003e\n\nOther training details are specified in an individual Python script named `training_components.py` **within the same directory** as `run_dis_train.sh`, which can be seen as following.\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eCode in Python script\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\t\n```python\nfrom rlzoo.common.env_wrappers import build_env\nfrom rlzoo.common.policy_networks import *\nfrom rlzoo.common.value_networks import *\nfrom rlzoo.algorithms.dppo_clip_distributed.dppo_clip import DPPO_CLIP\nfrom functools import partial\n\n# Specify the training configurations\ntraining_conf = {\n    'total_step': int(1e7),  # overall training timesteps\n    'traj_len': 200,         # length of the rollout trajectory\n    'train_n_traj': 2,       # update the models after every certain number of trajectories for each learner \n    'save_interval': 10,     # saving the models after every certain number of updates\n}\n\n# Specify the environment and launch it\nenv_name, env_type = 'CartPole-v0', 'classic_control'\nenv_maker = partial(build_env, env_name, env_type)\ntemp_env = env_maker()\nobs_shape, act_shape = temp_env.observation_space.shape, temp_env.action_space.shape\n\nenv_conf = {\n    'env_name': env_name,\n    'env_type': env_type,\n    'env_maker': env_maker,\n    'obs_shape': obs_shape,\n    'act_shape': act_shape,\n}\n\n\ndef build_network(observation_space, action_space, name='DPPO_CLIP'):\n    \"\"\" build networks for the algorithm \"\"\"\n    hidden_dim = 256\n    num_hidden_layer = 2\n    critic = ValueNetwork(observation_space, [hidden_dim] * num_hidden_layer, name=name + '_value')\n\n    actor = StochasticPolicyNetwork(observation_space, action_space,\n                                    [hidden_dim] * num_hidden_layer,\n                                    trainable=True,\n                                    name=name + '_policy')\n    return critic, actor\n\n\ndef build_opt(actor_lr=1e-4, critic_lr=2e-4):\n    \"\"\" choose the optimizer for learning \"\"\"\n    import tensorflow as tf\n    return [tf.optimizers.Adam(critic_lr), tf.optimizers.Adam(actor_lr)]\n\n\nnet_builder = partial(build_network, temp_env.observation_space, temp_env.action_space)\nopt_builder = partial(build_opt, )\n\nagent_conf = {\n    'net_builder': net_builder,\n    'opt_builder': opt_builder,\n    'agent_generator': partial(DPPO_CLIP, net_builder, opt_builder),\n}\ndel temp_env\n\nfrom rlzoo.distributed.start_dis_role import main\n\nprint('Start Training.')\nmain(training_conf, env_conf, agent_conf)\nprint('Training Finished.')\n\t\n```\nUsers can specify the environment, network architectures, optimizers and other training detains in this script.\n\t\n\u003c/div\u003e\n\u003c/details\u003e\n\t\nNote: if RLzoo is installed, you can create the two scripts `run_dis_train.sh` and `training_components.py` in whatever directory to launch distributed training, as long as the two scripts are in the same directory.\n\t\n\n\n## Contents\n### Algorithms\n\nChoices for `AlgName`: 'DQN', 'AC', 'A3C', 'DDPG', 'TD3', 'SAC', 'PG', 'TRPO', 'PPO', 'DPPO'\n\n| Algorithms      | Papers |\n| --------------- | -------|\n|**Value-based**||\n| Q-learning      | [Technical note: Q-learning. Watkins et al. 1992](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf)|\n| Deep Q-Network (DQN)| [Human-level control through deep reinforcement learning, Mnih et al. 2015.](https://www.nature.com/articles/nature14236/) |\n| Prioritized Experience Replay | [Schaul et al. Prioritized experience replay. Schaul et al. 2015.](https://arxiv.org/abs/1511.05952) |\n|Dueling DQN|[Dueling network architectures for deep reinforcement learning. Wang et al. 2015.](https://arxiv.org/abs/1511.06581)|\n|Double DQN|[Deep reinforcement learning with double q-learning. Van et al. 2016.](https://arxiv.org/abs/1509.06461)|\n|Retrace|[Safe and efficient off-policy reinforcement learning. Munos et al. 2016: ](https://arxiv.org/pdf/1606.02647.pdf)|\n|Noisy DQN|[Noisy networks for exploration. Fortunato et al. 2017.](https://arxiv.org/pdf/1706.10295.pdf)|\n| Distributed DQN (C51)| [A distributional perspective on reinforcement learning. Bellemare et al. 2017.](https://arxiv.org/pdf/1707.06887.pdf) |\n|**Policy-based**||\n|REINFORCE(PG) | [Simple statistical gradient-following algorithms for connectionist reinforcement learning. Ronald J. Williams  1992.](https://link.springer.com/article/10.1007/BF00992696)|\n| Trust Region Policy Optimization (TRPO)| [Abbeel et al. Trust region policy optimization. Schulman et al.2015.](https://arxiv.org/pdf/1502.05477.pdf) |\n| Proximal Policy Optimization (PPO) | [Proximal policy optimization algorithms. Schulman et al. 2017.](https://arxiv.org/abs/1707.06347) |\n|Distributed Proximal Policy Optimization (DPPO)|[Emergence of locomotion behaviours in rich environments. Heess et al. 2017.](https://arxiv.org/abs/1707.02286)|\n|**Actor-Critic**||\n|Actor-Critic (AC)| [Actor-critic algorithms. Konda er al. 2000.](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf)|\n| Asynchronous Advantage Actor-Critic (A3C)| [Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.](https://arxiv.org/pdf/1602.01783.pdf) |\n| Deep Deterministic Policy Gradient (DDPG) | [Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016](https://arxiv.org/pdf/1509.02971.pdf) |\n|Twin Delayed DDPG (TD3)|[Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.](https://arxiv.org/pdf/1802.09477.pdf)|\n|Soft Actor-Critic (SAC)|[Soft actor-critic algorithms and applications. Haarnoja et al. 2018.](https://arxiv.org/abs/1812.05905)|\n\n### Environments\n\nChoices for `EnvType`: 'atari', 'box2d', 'classic_control', 'mujoco', 'robotics', 'dm_control', 'rlbench'\n\n* [**OpenAI Gym**](https://gym.openai.com/envs):\n    * Atari\n    * Box2D\n    * Classic control\n    * MuJoCo\n    * Robotics\n\n* [**DeepMind Control Suite**](https://github.com/deepmind/dm_control)\n* [**RLBench**](https://github.com/stepjam/RLBench)\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eSome notes on environment usage.\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\n* Make sure the name of environment matches the type of environment in the main script. The types of environments include: 'atari', 'box2d', 'classic_control', 'mujoco', 'robotics', 'dm_control', 'rlbench'.\n* When using the DeepMind Control Suite, install the [dm2gym](https://github.com/zuoxingdong/dm2gym) package with: `pip install dm2gym`\n\n* When using the RLBench environments, please add the path of your local rlbench repository to python: \n  ```export PYTHONPATH=PATH_TO_YOUR_LOCAL_RLBENCH_REPO```\n* A dictionary of all different environments is stored in `./rlzoo/common/env_list.py`\n* Full list of environments in RLBench is [here](https://github.com/stepjam/RLBench/blob/master/rlbench/tasks/__init__.py).\n* Installation of Vrep-\u003ePyRep-\u003eRLBench follows [here](http://www.coppeliarobotics.com/downloads.html)-\u003e[here](https://github.com/stepjam/PyRep)-\u003e[here](https://github.com/stepjam/RLBench).\n\n\u003c/div\u003e\n\u003c/details\u003e\n\n\n## Configurations:\nThe supported configurations for RL algorithms with corresponding environments in RLzoo are listed in the following table.\n\n| Algorithms                 | Action Space        | Policy        | Update     | Envs                                                         |\n| -------------------------- | ------------------- | ------------- | ---------- | ------------------------------------------------------------ |\n| DQN (double, dueling, PER) | Discrete Only       | --            | Off-policy | Atari, Classic Control                                       |\n| AC                         | Discrete/Continuous | Stochastic    | On-policy  | All                                                          |\n| PG                         | Discrete/Continuous | Stochastic    | On-policy  | All                                                          |\n| DDPG                       | Continuous          | Deterministic | Off-policy | Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench |\n| TD3                        | Continuous          | Deterministic | Off-policy | Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench |\n| SAC                        | Continuous          | Stochastic    | Off-policy | Classic Control, Box2D, Mujoco, Robotics, DeepMind Control, RLBench |\n| A3C                        | Discrete/Continuous | Stochastic    | On-policy  | Atari, Classic Control, Box2D, Mujoco, Robotics, DeepMind Control |\n| PPO                        | Discrete/Continuous | Stochastic    | On-policy  | All                                                          |\n| DPPO                       | Discrete/Continuous | Stochastic    | On-policy  | Atari, Classic Control, Box2D, Mujoco, Robotics, DeepMind Control |\n| TRPO                       | Discrete/Continuous | Stochastic    | On-policy  | All                                                          |\n\n\n## Properties\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003e1. Automatic model construction\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\nWe aim to make it easy to configure for all components within RL, including replacing the networks, optimizers, etc. We also  provide automatically adaptive policies and value functions in the common functions: for the observation space, the vector state or the raw-pixel (image) state are supported automatically according to the shape of the space; for the action space, the discrete action or continuous action are supported automatically according to the shape of the space as well. The deterministic or stochastic property of policy needs to be chosen according to each algorithm. Some environments with raw-pixel based observation (e.g. Atari, RLBench) may be hard to train, be patient and play around with the hyperparameters!\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003e3. Simple and flexible API\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\nAs described in the Section of Usage, we provide at least two ways of deploying RLzoo: implicit configuration and explicit configuration process. We ensure the maximum flexiblity for different use cases with this design.\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003e3. Sufficient support for DRL algorithms and environments\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\nAs shown in above algorithms and environments tables.\t\n\u003c/div\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003e4. Interactive reinforcement learning configuration.\u003c/b\u003e \u003ci\u003e[click to expand]\u003c/i\u003e\u003c/summary\u003e\n\u003cdiv\u003e\n\nAs shown in the interactive use case in Section of Usage, a jupyter notebook is provided for more intuitively configuring the whole process of deploying the learning process ([`rlzoo/interactive/main.ipynb`](https://github.com/tensorlayer/RLzoo/blob/master/rlzoo/interactive/main.ipynb))\n\u003c/div\u003e\n\u003c/details\u003e\n\n\n\n\n## Troubleshooting\n\n* If you meet the error *'AttributeError: module 'tensorflow' has no attribute 'contrib''* when running the code after installing tensorflow-probability, try:\n  `pip install --upgrade tf-nightly-2.0-preview tfp-nightly`\n* When trying to use RLBench environments, *'No module named rlbench'* can be caused by no RLBench package installed at your local or a mistake in the python path. You should add `export PYTHONPATH=/home/quantumiracle/research/vrep/PyRep/RLBench` every time you try to run the learning script with RLBench environment or add it to you `~/.bashrc` file once for all.\n* If you meet the error that the Qt platform is not loaded correctly when using DeepMind Control Suite environments, it's probably caused by your Ubuntu system not being version 14.04 or 16.04. Check [here](https://github.com/deepmind/dm_control).\n\n## Credits\nOur core contributors include:\n\n[Zihan Ding](https://github.com/quantumiracle?tab=repositories),\n[Tianyang Yu](https://github.com/Tokarev-TT-33),\n[Yanhua Huang](https://github.com/Officium),\n[Hongming Zhang](https://github.com/initial-h),\n[Guo Li](https://github.com/lgarithm),\nQuancheng Guo,\n[Luo Mai](https://github.com/luomai),\n[Hao Dong](https://github.com/zsdonghao)\n\n\n## Citing\n\n```\n@article{ding2020rlzoo,\n  title={RLzoo: A Comprehensive and Adaptive Reinforcement Learning Library},\n  author={Ding, Zihan and Yu, Tianyang and Huang, Yanhua and Zhang, Hongming and Mai, Luo and Dong, Hao},\n  journal={arXiv preprint arXiv:2009.08644},\n  year={2020}\n}\n```\n\nand\n\t\n```\n@book{deepRL-2020,\n title={Deep Reinforcement Learning: Fundamentals, Research, and Applications},\n editor={Hao Dong, Zihan Ding, Shanghang Zhang},\n author={Hao Dong, Zihan Ding, Shanghang Zhang, Hang Yuan, Hongming Zhang, Jingqing Zhang, Yanhua Huang, Tianyang Yu, Huaqing Zhang, Ruitong Huang},\n publisher={Springer Nature},\n note={\\url{http://www.deepreinforcementlearningbook.org}},\n year={2020}\n}\n```\n\t\n\t\n## Other Resources\n\u003cbr/\u003e\n\u003ca href=\"http://www.broadview.com.cn/book/6544\" target=\"\\_blank\"\u003e\n\t\u003cdiv align=\"center\"\u003e\n\t\t\u003cimg src=\"http://download.broadview.com.cn/ScreenShow/2106dcc52ead176cb568\" width=\"20%\"/\u003e\n\t\u003c/div\u003e\n\u003c!-- \t\u003cdiv align=\"center\"\u003e\u003ccaption\u003eSlack Invitation Link\u003c/caption\u003e\u003c/div\u003e --\u003e\n\u003c/a\u003e\n\t\n\u003ca href=\"https://deepreinforcementlearningbook.org\" target=\"\\_blank\"\u003e\n\t\u003cdiv align=\"center\"\u003e\n\t\t\u003cimg src=\"http://deep-reinforcement-learning-book.github.io/assets/images/cover_v1.png\" width=\"20%\"/\u003e\n\t\u003c/div\u003e\n\u003c!-- \t\u003cdiv align=\"center\"\u003e\u003ccaption\u003eSlack Invitation Link\u003c/caption\u003e\u003c/div\u003e --\u003e\n\u003c/a\u003e\n\u003cbr/\u003e\n\t\n\u003cbr/\u003e\n\u003ca href=\"https://deepreinforcementlearningbook.org\" target=\"\\_blank\"\u003e\n\t\u003cdiv align=\"center\"\u003e\n\t\t\u003cimg src=\"docs/img/logo.png\" width=\"80%\"/\u003e\n\t\u003c/div\u003e\n\u003c!-- \t\u003cdiv align=\"center\"\u003e\u003ccaption\u003eSlack Invitation Link\u003c/caption\u003e\u003c/div\u003e --\u003e\n\u003c/a\u003e\n\u003cbr/\u003e\n","funding_links":[],"categories":["1. Projects","6. Reinforcement Learning"],"sub_categories":["5.5 Spam Detection"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorlayer%2FRLzoo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftensorlayer%2FRLzoo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorlayer%2FRLzoo/lists"}