https://github.com/geyang/dmc_gen
https://github.com/geyang/dmc_gen
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/geyang/dmc_gen
- Owner: geyang
- License: mit
- Created: 2021-02-21T20:10:01.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-13T14:25:06.000Z (about 4 years ago)
- Last Synced: 2025-01-10T12:58:32.973Z (5 months ago)
- Language: Python
- Size: 103 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DMControl Generalization Benchmark
This code base is a fork-in-progress from [Hansen & Wang 2020](https://arxiv.org/abs/2011.13389)'sThe [./custom_vendor](./custom_vendor) folder contains the benchmark for generalization in continuous control from pixels based on [DMControl](https://github.com/deepmind/dm_control). Installation guide for these custom library could be found [there](./custom_vendor).
## Algorithms
This repository contains implementations of the following papers in a unified framework:
- [SODA (Hansen and Wang, 2020)](https://arxiv.org/abs/2011.13389)
- [PAD (Hansen et al., 2020)](https://arxiv.org/abs/2007.04309)
- [RAD (Laskin et al., 2020)](https://arxiv.org/abs/2004.14990)
- [CURL (Srinivas et al., 2020)](https://arxiv.org/abs/2004.04136)
- [SAC (Haarnoja et al., 2018)](https://arxiv.org/abs/1812.05905)using standardized architecture and hyper-parameters, wherever applicable.
## Setup
We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:
```bash
conda env create -f conda.yml
conda activate dmcgen
```### Datasets Needed for Envs
Part of this repository relies on external datasets. SODA uses the [Places](http://places2.csail.mit.edu/download.html) dataset for data augmentation, which can be downloaded by running
```
wget http://data.csail.mit.edu/places/places365/places365standard_easyformat.tar
```You should familiarize yourself with [their terms](http://places2.csail.mit.edu/download.html) before downloading. After downloading and extracting the data, add your dataset directory to the `data_dirs` list in `src/augmentations.py`.
> if wget is being too slow, you can try too use `axel` which parallelizes the download.
>
> ```bash
> sudo apt-get install axel
> axel http://data.csail.mit.edu/places/places365/places365standard_easyformat.tar
> ```
>
> If you do not have sudo rights, you can install axel from source (or use precompiled binary) [here](https://github.com/axel-download-accelerator/axel).The `video_easy` environment was proposed in [PAD](https://github.com/nicklashansen/policy-adaptation-during-deployment), and the `video_hard` environment uses a subset of the [RealEstate10K](https://google.github.io/realestate10k/) dataset for background rendering. All test environments (including video files) are included in this repository, namely in the `src/env/` directory.
```bash
cd custom_vendor && make
```Remember to add `export DMCGEN_DATA=$($PWD)/custom_vendor/data` environment variable to point to the location of this folder.
## Training & Evaluation
The `scripts` directory contains training and evaluation bash scripts for all the included algorithms. Alternatively, you can call the python scripts directly, e.g. for training call
```
python3 src/train.py \
--algorithm soda \
--aux_lr 3e-4 \
--seed 0
```to run SODA on the default task, `walker_walk`. This should give you an output of the form:
```
Working directory: logs/walker_walk/soda/0
Evaluating: logs/walker_walk/soda/0
| eval | S: 0 | ER: 26.2285 | ERTEST: 25.3730
| train | E: 1 | S: 250 | D: 70.1 s | R: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | AUXLOSS: 0.0000
```
where `ER` and `ERTEST` corresponds to the average return in the training and test environments, respectively. You can select the test environment used in evaluation with the `--eval_mode` argument, which accepts one of `(train, color_easy, color_hard, video_easy, video_hard)`.## Results
SODA demonstrates significantly improved generalization over previous methods, exhibits stable training, and has a sample efficiency that is comparable to the baseline SAC. Average return of SODA and baselines in the `train` and `color_hard` environments is shown below.

We also provide a full comparison of the SODA, PAD, RAD, and CURL methods on all four test environments. Results for `video_easy` and `color_hard` are shown below:

See [our paper](https://arxiv.org/abs/2011.13389) for more results.
## Acknowledgements
We want to thank the numerous researchers and engineers involved in work of which this implementation is based on. This benchmark is a product of our work on [SODA](https://arxiv.org/abs/2011.13389) and [PAD](https://arxiv.org/abs/2007.04309), our SAC implementation is based on [this repository](https://github.com/denisyarats/pytorch_sac_ae), the original DMControl is available [here](https://github.com/deepmind/dm_control), and the gym wrapper for it is available [here](https://github.com/denisyarats/dmc2gym). PAD, RAD, and CURL baselines are based on their official implementations provided [here](https://github.com/nicklashansen/policy-adaptation-during-deployment), [here](https://github.com/MishaLaskin/rad), and [here](https://github.com/MishaLaskin/curl), respectively.