https://github.com/YeWR/EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
https://github.com/YeWR/EfficientZero

Last synced: 7 months ago
JSON representation

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Host: GitHub
URL: https://github.com/YeWR/EfficientZero
Owner: YeWR
License: gpl-3.0
Created: 2021-10-21T06:03:14.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-12-20T07:31:16.000Z (over 1 year ago)
Last Synced: 2024-08-09T13:19:31.483Z (11 months ago)
Language: Python
Size: 2.11 MB
Stars: 851
Watchers: 47
Forks: 133
Open Issues: 32
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - YeWR/EfficientZero

README

        # EfficientZero (NeurIPS 2021)

Open-source codebase for EfficientZero, from ["Mastering Atari Games with Limited Data"](https://arxiv.org/abs/2111.00210) at NeurIPS 2021.

## Environments

EfficientZero requires python3 (>=3.6) and pytorch (>=1.8.0) with the development headers. 

We recommend to use torch amp (`--amp_type torch_amp`) to accelerate training.

### Prerequisites

Before starting training, you need to build the c++/cython style external packages. (GCC version 7.5+ is required.)

```

cd core/ctree

bash make.sh

``` 

The distributed framework of this codebase is built on [ray](https://docs.ray.io/en/releases-1.0.0/auto_examples/overview.html).

### Installation

As for other packages required for this codebase, please run `pip install -r requirements.txt`.

## Usage

### Quick start

* Train: `python main.py --env BreakoutNoFrameskip-v4 --case atari --opr train --amp_type torch_amp --num_gpus 1 --num_cpus 10 --cpu_actor 1 --gpu_actor 1 --force`

* Test: `python main.py --env BreakoutNoFrameskip-v4 --case atari --opr test --amp_type torch_amp --num_gpus 1 --load_model --model_path model.p \`

### Bash file

We provide `train.sh` and `test.sh` for training and evaluation.

* Train: 

  * With 4 GPUs (3090): `bash train.sh`

* Test: `bash test.sh`

|Required Arguments | Description|

|:-------------|:-------------|

| `--env`                             |Name of the environment|

| `--case {atari}`                    |It's used for switching between different domains(default: atari)|

| `--opr {train,test}`                |select the operation to be performed|

| `--amp_type {torch_amp,none}`       |use torch amp for acceleration|

|Other Arguments | Description|

|:-------------|:-------------|

| `--force`                           |will rewrite the result directory

| `--num_gpus 4`                      |how many GPUs are available

| `--num_cpus 96`                     |how many CPUs are available

| `--cpu_actor 14`                    |how many cpu workers

| `--gpu_actor 20`                    |how many gpu workers

| `--seed 0`                          |the seed

| `--use_priority`                    |use priority in replay buffer sampling

| `--use_max_priority`                |use the max priority for the newly collectted data

| `--amp_type 'torch_amp'`            |use torch amp for acceleration

| `--info 'EZ-V0'`                    |some tags for you experiments

| `--p_mcts_num 8`                    |set the parallel number of envs in self-play 

| `--revisit_policy_search_rate 0.99` |set the rate of reanalyzing policies

| `--use_root_value`                  |use root values in value targets (require more GPU actors)

| `--render`                          |render in evaluation

| `--save_video`                      |save videos for evaluation

 

## Architecture Designs

The architecture of the training pipeline is shown as follows:

![](static/imgs/archi.png)

### Some suggestions

* To use a smaller model, you can choose smaller dim of the projection layers (Eg: 256/64) and the LSTM hidden layer (Eg: 64) in the config. 

* For GPUs with 10G memory instead of 20G memory, you can allocate 0.25 gpu for each GPU maker (`@ray.remote(num_gpus=0.25)`) in `core/reanalyze_worker.py`.

### New environment registration

If you wan to apply EfficientZero to a new environment like `mujoco`. Here are the steps for registration:

1. Follow the directory `config/atari` and create dir for the env at `config/mujoco`.

2. Implement your `MujocoConfig(BaseConfig)` class and implement the models as well as your environment wrapper.

3. Register the case at `main.py`.

## Results 

Evaluation with 32 seeds for 3 different runs (different seeds).

![](static/imgs/total_results.png)

## Citation

If you find this repo useful, please cite our paper:

```

@inproceedings{ye2021mastering,

  title={Mastering Atari Games with Limited Data},

  author={Weirui Ye, and Shaohuai Liu, and Thanard Kurutach, and Pieter Abbeel, and Yang Gao},

  booktitle={NeurIPS},

  year={2021}

}

```

## Contact

If you have any question or want to use the code, please contact [email protected] .

## Acknowledgement

We appreciate the following github repos a lot for their valuable code base implementations:

https://github.com/koulanurag/muzero-pytorch

https://github.com/werner-duvaud/muzero-general

https://github.com/pytorch/ELF

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/YeWR/EfficientZero

Awesome Lists containing this project

README