Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/quantumiracle/consistency_model_for_reinforcement_learning
Official implementation for: Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning ICLR'24
https://github.com/quantumiracle/consistency_model_for_reinforcement_learning
Last synced: 18 days ago
JSON representation
Official implementation for: Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning ICLR'24
- Host: GitHub
- URL: https://github.com/quantumiracle/consistency_model_for_reinforcement_learning
- Owner: quantumiracle
- License: apache-2.0
- Created: 2024-02-06T04:09:43.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-08-28T18:11:30.000Z (3 months ago)
- Last Synced: 2024-10-03T19:43:30.186Z (about 2 months ago)
- Language: Python
- Size: 48.8 KB
- Stars: 22
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Consistency Models for RL — Official PyTorch Implementation
Official implementation for:**Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning**
Zihan Ding, Chi Jin
[https://arxiv.org/abs/2309.16984](https://arxiv.org/abs/2309.16984)## Requirements
Installations of [PyTorch](https://pytorch.org/), [MuJoCo](https://github.com/deepmind/mujoco), and [D4RL](https://github.com/Farama-Foundation/D4RL) are needed. Please see the ``requirements.txt`` for environment set up details.
```
pip install -r requirements.txt
```## Run
You can use either diffusion model or consistency model.### Dataset
First download D4RL dataset with:
```
python download_data.py
```
The data will be saved in `./dataset/`.### Offline RL
```
# train offline RL Consistency-AC for hopper-medium-v2 task
python offline.py --env_name hopper-medium-v2 --model consistency --ms offline --exp RUN_NAME --save_best_model --lr_decay
# train offline RL Diffusion-QL for walker2d-medium-expert-v2 task
python offline.py --env_name walker2d-medium-expert-v2 --model diffusion --ms offline --exp RUN_NAME --save_best_model --lr_decay
```
### Online RL
From scratch:
```
# train online RL Consistency-AC for hopper-medium-v2 task
python online.py --env_name hopper-medium-v2 --num_envs 3 --model consistency --exp RUN_NAME
# train online RL Diffusion-QL for walker2d-medium-expert-v2 task
python online.py --env_name walker2d-medium-expert-v2 --num_envs 3 --model diffusion --exp RUN_NAME
```
Online RL initialized with offline pre-trained models (offline-to-online):
```
python online.py --env_name kitchen-mixed-v0 --num_envs 3 --model consistency --exp online_test --load_model 'results/**PATH**' --load_id 'online'
```
As an example, with a model saved in path `results/**PATH**/actor_online.pth`, it will be loaded for initializing the online training with the above command.### Training Scripts
Use bash scripts:
```
bash scripts/offline.sh
bash scripts/online.sh
```Use Slurm scripts:
```
sbatch scripts/offline.slurm
sbatch scripts/online.slurm
sbatch scripts/offline2online.slurm
```## Citation
If you find this open source release useful, please cite in your paper:
```
@article{ding2023consistency,
title={Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning},
author={Ding, Zihan and Jin, Chi},
journal={arXiv preprint arXiv:2309.16984},
year={2023}
}
```## Acknowledgement
We acknowledge the original official repo of [Diffusion Policy
](https://github.com/Zhendong-Wang/Diffusion-Policies-for-Offline-RL)
and corresponding paper: [https://arxiv.org/abs/2208.06193](https://arxiv.org/abs/2208.06193).