https://github.com/ZhengYinan-AIR/FISOR
[ICLR 2024] The official implementation of "Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model"
https://github.com/ZhengYinan-AIR/FISOR
diffusion-models hamilton-jacobi-reachability imitation-learning jax offline-reinforcement-learning reinforcement-learning safe-reinforcement-learning
Last synced: 5 months ago
JSON representation
[ICLR 2024] The official implementation of "Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model"
- Host: GitHub
- URL: https://github.com/ZhengYinan-AIR/FISOR
- Owner: ZhengYinan-AIR
- Created: 2023-10-29T15:34:30.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-06-23T14:52:27.000Z (10 months ago)
- Last Synced: 2024-08-02T08:07:02.827Z (9 months ago)
- Topics: diffusion-models, hamilton-jacobi-reachability, imitation-learning, jax, offline-reinforcement-learning, reinforcement-learning, safe-reinforcement-learning
- Language: Python
- Homepage: https://zhengyinan-air.github.io/FISOR/
- Size: 13.2 MB
- Stars: 55
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-diffusion-model-in-rl - official
README
# Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
![]()
International Conference on Learning Representation (ICLR), 2024[**[Project Page]**](https://zhengyinan-air.github.io/FISOR/) [**[Arxiv]**](https://arxiv.org/pdf/2401.10700.pdf) [**[Openreview]**](https://openreview.net/forum?id=j5JvZCaDM0)
[Yinan Zheng*](https://scholar.google.com/citations?user=mHXjEbQAAAAJ&hl=zh-CN&authuser=1), [Jianxiong Li*](https://facebear-ljx.github.io/), [Dongjie Yu](https://manutdmoon.github.io/), [Yujie Yang](https://yangyujie-jack.github.io/), [Shengbo Eben Li](https://scholar.google.com/citations?user=Dxiw1K8AAAAJ&hl=zh-CN), [Xianyuan Zhan](https://zhanzxy5.github.io/zhanxianyuan/), [Jingjing Liu](https://air.tsinghua.edu.cn/en/info/1046/1194.htm)
The official implementation of FISOR, which **represents a pioneering effort in considering hard constraints (Hamilton-Jacobi Reachability) within the safe offline RL setting**.
# Methods
FISOR transforms the original tightly-coupled safety-constrained offline RL problem into
three decoupled simple supervised objectives:- Offline identification of the largest feasible region;
- Optimal advantage learning;
- Optimal policy extraction via time-independent classifier-guided diffusion model, enhancing both performance and stability.
![]()
## Branches Overview
| Branch name | Usage |
|:---: |:---: |
| [master](https://github.com/ZhengYinan-AIR/FISOR) | FISOR implementation for ``Point Robot``, ``Safety-Gymnasium`` and ``Bullet-Safety-Gym``; data quantity experiment; feasible region visualization. |
| [metadrive_imitation](https://github.com/ZhengYinan-AIR/FISOR/tree/metadrive_imitation) | FISOR implementation for ``MetaDrive``; data quantity experiment; imitation learning experiment. |## Installation
``` Bash
conda create -n env_name python=3.9
conda activate FISOR
git clone https://github.com/ZhengYinan-AIR/FISOR.git
cd FISOR
pip install -r requirements.txt
```## Main results
Run
``` Bash
# OfflineCarButton1Gymnasium-v0
export XLA_PYTHON_CLIENT_PREALLOCATE=False
python launcher/examples/train_offline.py --env_id 0 --config configs/train_config.py:fisor
```
where ``env_id`` serves as an index for the [list of environments](https://github.com/ZhengYinan-AIR/FISOR/blob/master/env/env_list.py).## Data Quantity Experiments
We can run [filter_data.py](https://github.com/ZhengYinan-AIR/FISOR/blob/master/filter_data.py) to generate offline data of varying volumes. We also can download the necessary offline datasets ([Download link](https://cloud.tsinghua.edu.cn/d/591cf8fd6d8649a89df4/)). Then run
``` Bash
python launcher/examples/train_offline.py --env_id 17 --config configs/train_config.py:fisor --ratio 0.1
```
where ``ratio`` refers to the proportion of the processed data to the original dataset.## Feasible Region Visualization
We need to download the necessary offline dataset for ``Point Robot`` environment ([Download link](https://cloud.tsinghua.edu.cn/d/162d6fe92bde43e28676/)). Training FISOR in the ``Point Robot`` environment
``` Bash
python launcher/examples/train_offline.py --env_id 29 --config configs/train_config.py:fisor
```
Then visualize the feasible region by running [viz_map.py](https://github.com/ZhengYinan-AIR/FISOR/blob/master/launcher/viz/viz_map.py).
![]()
## Bibtex
If you find our code and paper can help, please cite our paper as:
```
@inproceedings{
zheng2024safe,
title={Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model},
author={Yinan Zheng and Jianxiong Li and Dongjie Yu and Yujie Yang and Shengbo Eben Li and Xianyuan Zhan and Jingjing Liu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=j5JvZCaDM0}
}
```## Acknowledgements
Parts of this code are adapted from [IDQL](https://github.com/philippe-eecs/IDQL) and [DRPO](https://github.com/ManUtdMoon/Distributional-Reachability-Policy-Optimization).