Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huawei-noah/xingtian
xingtian is a componentized library for the development and verification of reinforcement learning algorithms
https://github.com/huawei-noah/xingtian
dqn impala muzero ppo qmix reinforcement-learning-algorithms
Last synced: 3 months ago
JSON representation
xingtian is a componentized library for the development and verification of reinforcement learning algorithms
- Host: GitHub
- URL: https://github.com/huawei-noah/xingtian
- Owner: huawei-noah
- License: mit
- Created: 2020-08-15T14:13:06.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-09-12T12:56:36.000Z (about 1 year ago)
- Last Synced: 2024-06-24T06:00:55.266Z (5 months ago)
- Topics: dqn, impala, muzero, ppo, qmix, reinforcement-learning-algorithms
- Language: Python
- Homepage:
- Size: 7.05 MB
- Stars: 302
- Watchers: 14
- Forks: 89
- Open Issues: 12
-
Metadata Files:
- Readme: README.cn.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - huawei-noah/xingtian
README
[English](./README.md)
## 简介
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
**刑天 (XingTian)** 是一个组件化强化学习库,用于开发、验证强化学习算法。它目前已支持包括DQN、DDPG、PPO和IMPALA等系列算法,可以在多种环境中训练智能体,如Gym、Atari、Torcs、StarCraftII等。 为了满足用户快速验证和解决RL问题的需求,刑天抽象出了四个模块:`Algorithm`,`Model`,`Agent`,`Environment`。它们的工作方式类似于"乐高"积木的组合。更详细的内容请[阅读架构介绍](./docs/basic_arch.cn.md).
## 系统依赖
```shell
# ubuntu 18.04
sudo apt-get install python3-pip libopencv-dev -y
pip3 install opencv-python# Run with tensorflow 1.15.0 or tensorflow 2.3.1
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 absl-py psutil tensorboardX setproctitle
```也可使用pip 进行依赖安装 `pip3 install -r requirements.txt`
如果需要使用Pytorch 作为后端引擎,请自行安装. [Ref Pytorch](https://pytorch.org/get-started/locally/)
## 安装
```zsh
# cd PATH/TO/XingTian
pip3 install -e .
```可通过 `import xt; print(xt.__Version__)` 来确认是否已正常安装.
```python
In [1]: import xtIn [2]: xt.__version__
Out[2]: '0.3.0'
```## 快速开始
---------
#### 参数配置
下面是一个有关 [倒立摆](https://gym.openai.com/envs/CartPole-v0/) 简单任务的参数示例,我们通过配置系统中已注册的算法,环境信息来组合训练任务。有关不同参数更详细的描述可以在[用户指导](./docs/user.cn.md) 中找到。```yaml
alg_para:
alg_name: PPO
alg_config:
process_num: 1
save_model: True # default False
save_interval: 100env_para:
env_name: GymEnv
env_info:
name: CartPole-v0
vision: Falseagent_para:
agent_name: PPO
agent_num : 1
agent_config:
max_steps: 200
complete_step: 1000000
complete_episode: 3550model_para:
actor:
model_name: PpoMlp
state_dim: [4]
action_dim: 2
input_dtype: float32
model_config:
BATCH_SIZE: 200
CRITIC_LOSS_COEF: 1.0
ENTROPY_LOSS: 0.01
LR: 0.0003
LOSS_CLIPPING: 0.2
MAX_GRAD_NORM: 5.0
NUM_SGD_ITER: 8
SUMMARY: False
VF_SHARE_LAYERS: False
activation: tanh
hidden_sizes: [64, 64]env_num: 10
```另外在 [examples](./examples) 目录下,可以找到更加丰富的训练配置示例。
#### 开始训练任务
```python3 xt/main.py -f examples/cartpole_ppo.yaml -t train```
![img](./docs/.images/cartpole.gif)
#### 评估本机模型
在你的`.yaml`文件中设置 `benchmark.eval.model_path` 参数,然后通过 `-t evaluate` 运行评估任务。
```
benchmark:
eval:
model_path: /YOUR/PATH/TO/EVAL/models
gap: 10 # 目录下需评估模型的间隔
evaluator_num: 1 # 启动评估实例的数量,可支持并行评估# 运行命令
python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate
```> 系统默认启动训练任务,即 -t 的默认选项是 train
#### 使用命令行
```zsh
# 在终端中,可直接使用xt_main 替换 python3 xt/main.py 执行命令
xt_main -f examples/cartpole_ppo.yaml -t train# train with evaluate
xt_main -f examples/cartpole_ppo.yaml -t train_with_evaluate
```## 自定义任务的开发
1. 编写自定义模块,并注册。 具体可参考 [开发指导](./docs/developer.cn.md)
2. 在配置文件 `your_train_configure.yaml`中,配置自定义的模块名字
3. 启动训练 `xt_main -f path/to/your_train_configure.yaml` :)## 实验结果参考
#### 平均的训练回报
1. 10M step 之后的**DQN** 收敛回报 (**40M frames**).
| env | XingTian Basic DQN | RLlib Basic DQN | Hessel et al. DQN |
| ------------- | ------------------ | --------------- | ----------------- |
| BeamRider | 6706 | 2869 | ~2000 |
| Breakout | 352 | 287 | ~150 |
| QBert | 14087 | 3921 | ~4000 |
| SpaceInvaders | 947 | 650 | ~500 |2. 10M step 之后的**PPO** 收敛回报 (**40M frames**).
| env | XingTian PPO | RLlib PPO | Baselines PPO |
| ------------- | ------------ | --------- | ------------- |
| BeamRider | 4877 | 2807 | ~1800 |
| Breakout | 341 | 104 | ~250 |
| QBert | 14771 | 11085 | ~14000 |
| SpaceInvaders | 1025 | 671 | ~800 |3. 10M step 之后的**IMPALA** 收敛回报 (**40M frames**).
| env | XingTian IMPALA | RLlib IMPALA |
| ------------- | --------------- | ------------ |
| BeamRider | 2313 | 2071 |
| Breakout | 334 | 385 |
| QBert | 12205 | 4068 |
| SpaceInvaders | 742 | 719 |#### 吞吐量
1. **DQN**
| env | XingTian Basic DQN | RLlib Basic DQN |
| ------------- | ------------------ | --------------- |
| BeamRider | 129 | 109 |
| Breakout | 117 | 113 |
| QBert | 111 | 90 |
| SpaceInvaders | 115 | 100 |2. **PPO**
| env | XingTian PPO | RLlib PPO |
| ------------- | ------------ | --------- |
| BeamRider | 2422 | 1618 |
| Breakout | 2497 | 1535 |
| QBert | 2436 | 1617 |
| SpaceInvaders | 2438 | 1608 |3. **IMPALA**
| env | XingTian IMPALA | RLlib IMPALA |
| ------------- | --------------- | ------------ |
| BeamRider | 8756 | 3637 |
| Breakout | 8814 | 3525 |
| QBert | 8249 | 3471 |
| SpaceInvaders | 8463 | 3555 |> 实验硬件环境: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100
>
> Ray reward数据来自 [https://github.com/ray-project/rl-experiments](https://github.com/ray-project/rl-experiments), 吞吐量来自以上硬件设备的测试数据## 致谢
刑天参考了以下项目: [DeepMind/scalable_agent](https://github.com/deepmind/scalable_agent), [baselines](https://github.com/openai/baselines), [ray](https://github.com/ray-project/ray).
## 许可证
The MIT License(MIT)