https://github.com/YeWR/EfficientZero
  
  
    Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. 
    https://github.com/YeWR/EfficientZero
  
        Last synced: 3 months ago 
        JSON representation
    
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
- Host: GitHub
- URL: https://github.com/YeWR/EfficientZero
- Owner: YeWR
- License: gpl-3.0
- Created: 2021-10-21T06:03:14.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-12-20T07:31:16.000Z (almost 2 years ago)
- Last Synced: 2024-11-28T02:34:47.746Z (11 months ago)
- Language: Python
- Size: 2.11 MB
- Stars: 871
- Watchers: 47
- Forks: 137
- Open Issues: 32
- 
            Metadata Files:
            - Readme: README.md
- License: LICENSE
 
Awesome Lists containing this project
- StarryDivineSky - YeWR/EfficientZero
README
          # EfficientZero (NeurIPS 2021)
Open-source codebase for EfficientZero, from ["Mastering Atari Games with Limited Data"](https://arxiv.org/abs/2111.00210) at NeurIPS 2021.
## Environments
EfficientZero requires python3 (>=3.6) and pytorch (>=1.8.0) with the development headers. 
We recommend to use torch amp (`--amp_type torch_amp`) to accelerate training.
### Prerequisites
Before starting training, you need to build the c++/cython style external packages. (GCC version 7.5+ is required.)
```
cd core/ctree
bash make.sh
``` 
The distributed framework of this codebase is built on [ray](https://docs.ray.io/en/releases-1.0.0/auto_examples/overview.html).
### Installation
As for other packages required for this codebase, please run `pip install -r requirements.txt`.
## Usage
### Quick start
* Train: `python main.py --env BreakoutNoFrameskip-v4 --case atari --opr train --amp_type torch_amp --num_gpus 1 --num_cpus 10 --cpu_actor 1 --gpu_actor 1 --force`
* Test: `python main.py --env BreakoutNoFrameskip-v4 --case atari --opr test --amp_type torch_amp --num_gpus 1 --load_model --model_path model.p \`
### Bash file
We provide `train.sh` and `test.sh` for training and evaluation.
* Train: 
  * With 4 GPUs (3090): `bash train.sh`
* Test: `bash test.sh`
|Required Arguments | Description|
|:-------------|:-------------|
| `--env`                             |Name of the environment|
| `--case {atari}`                    |It's used for switching between different domains(default: atari)|
| `--opr {train,test}`                |select the operation to be performed|
| `--amp_type {torch_amp,none}`       |use torch amp for acceleration|
|Other Arguments | Description|
|:-------------|:-------------|
| `--force`                           |will rewrite the result directory
| `--num_gpus 4`                      |how many GPUs are available
| `--num_cpus 96`                     |how many CPUs are available
| `--cpu_actor 14`                    |how many cpu workers
| `--gpu_actor 20`                    |how many gpu workers
| `--seed 0`                          |the seed
| `--use_priority`                    |use priority in replay buffer sampling
| `--use_max_priority`                |use the max priority for the newly collectted data
| `--amp_type 'torch_amp'`            |use torch amp for acceleration
| `--info 'EZ-V0'`                    |some tags for you experiments
| `--p_mcts_num 8`                    |set the parallel number of envs in self-play 
| `--revisit_policy_search_rate 0.99` |set the rate of reanalyzing policies
| `--use_root_value`                  |use root values in value targets (require more GPU actors)
| `--render`                          |render in evaluation
| `--save_video`                      |save videos for evaluation
 
## Architecture Designs
The architecture of the training pipeline is shown as follows:

### Some suggestions
* To use a smaller model, you can choose smaller dim of the projection layers (Eg: 256/64) and the LSTM hidden layer (Eg: 64) in the config. 
* For GPUs with 10G memory instead of 20G memory, you can allocate 0.25 gpu for each GPU maker (`@ray.remote(num_gpus=0.25)`) in `core/reanalyze_worker.py`.
### New environment registration
If you wan to apply EfficientZero to a new environment like `mujoco`. Here are the steps for registration:
1. Follow the directory `config/atari` and create dir for the env at `config/mujoco`.
2. Implement your `MujocoConfig(BaseConfig)` class and implement the models as well as your environment wrapper.
3. Register the case at `main.py`.
## Results 
Evaluation with 32 seeds for 3 different runs (different seeds).

## Citation
If you find this repo useful, please cite our paper:
```
@inproceedings{ye2021mastering,
  title={Mastering Atari Games with Limited Data},
  author={Weirui Ye, and Shaohuai Liu, and Thanard Kurutach, and Pieter Abbeel, and Yang Gao},
  booktitle={NeurIPS},
  year={2021}
}
```
## Contact
If you have any question or want to use the code, please contact ywr20@mails.tsinghua.edu.cn .
## Acknowledgement
We appreciate the following github repos a lot for their valuable code base implementations:
https://github.com/koulanurag/muzero-pytorch
https://github.com/werner-duvaud/muzero-general
https://github.com/pytorch/ELF