https://github.com/ZhengYinan-AIR/FISOR

[ICLR 2024] The official implementation of "Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model"
https://github.com/ZhengYinan-AIR/FISOR

diffusion-models hamilton-jacobi-reachability imitation-learning jax offline-reinforcement-learning reinforcement-learning safe-reinforcement-learning

Last synced: 21 days ago
JSON representation

[ICLR 2024] The official implementation of "Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model"

Host: GitHub
URL: https://github.com/ZhengYinan-AIR/FISOR
Owner: ZhengYinan-AIR
Created: 2023-10-29T15:34:30.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2025-02-11T09:37:33.000Z (3 months ago)
Last Synced: 2025-02-11T10:36:05.629Z (3 months ago)
Topics: diffusion-models, hamilton-jacobi-reachability, imitation-learning, jax, offline-reinforcement-learning, reinforcement-learning, safe-reinforcement-learning
Language: Python
Homepage: https://zhengyinan-air.github.io/FISOR/
Size: 13.1 MB
Stars: 89
Watchers: 2
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-diffusion-model-in-rl - official

README

        # Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model 

International Conference on Learning Representation (ICLR), 2024

[**[Project Page]**](https://zhengyinan-air.github.io/FISOR/) [**[Arxiv]**](https://arxiv.org/pdf/2401.10700.pdf) [**[Openreview]**](https://openreview.net/forum?id=j5JvZCaDM0)

[Yinan Zheng*](https://scholar.google.com/citations?user=mHXjEbQAAAAJ&hl=zh-CN&authuser=1), [Jianxiong Li*](https://facebear-ljx.github.io/), [Dongjie Yu](https://manutdmoon.github.io/), [Yujie Yang](https://yangyujie-jack.github.io/), [Shengbo Eben Li](https://scholar.google.com/citations?user=Dxiw1K8AAAAJ&hl=zh-CN), [Xianyuan Zhan](https://zhanzxy5.github.io/zhanxianyuan/), [Jingjing Liu](https://air.tsinghua.edu.cn/en/info/1046/1194.htm)

The official implementation of FISOR, which **represents a pioneering effort in considering hard constraints (Hamilton-Jacobi Reachability) within the safe offline RL setting**. 

# Methods

FISOR transforms the original tightly-coupled safety-constrained offline RL problem into

three decoupled simple supervised objectives: 

- Offline identification of the largest feasible region;

- Optimal advantage learning;

- Optimal policy extraction via time-independent classifier-guided diffusion model, enhancing both performance and stability.







## Branches Overview

| Branch name 	| Usage 	|

|:---:	|:---:	|

| [master](https://github.com/ZhengYinan-AIR/FISOR) 	| FISOR implementation for ``Point Robot``, ``Safety-Gymnasium`` and ``Bullet-Safety-Gym``; data quantity experiment; feasible region visualization. |

| [metadrive_imitation](https://github.com/ZhengYinan-AIR/FISOR/tree/metadrive_imitation) 	| FISOR implementation for ``MetaDrive``; data quantity experiment; imitation learning experiment. 	|

## Installation

``` Bash

conda create -n env_name python=3.9

conda activate FISOR

git clone https://github.com/ZhengYinan-AIR/FISOR.git

cd FISOR

pip install -r requirements.txt

```

## Main results

Run

``` Bash

# OfflineCarButton1Gymnasium-v0

export XLA_PYTHON_CLIENT_PREALLOCATE=False

python launcher/examples/train_offline.py --env_id 0 --config configs/train_config.py:fisor

```

where ``env_id`` serves as an index for the [list of environments](https://github.com/ZhengYinan-AIR/FISOR/blob/master/env/env_list.py).

## Data Quantity Experiments

We can run [filter_data.py](https://github.com/ZhengYinan-AIR/FISOR/blob/master/filter_data.py) to generate offline data of varying volumes. We also can download the necessary offline datasets ([Download link](https://cloud.tsinghua.edu.cn/d/591cf8fd6d8649a89df4/)). Then run

``` Bash

python launcher/examples/train_offline.py --env_id 17 --config configs/train_config.py:fisor --ratio 0.1

```

where ``ratio`` refers to the proportion of the processed data to the original dataset.

## Feasible Region Visualization

We need to download the necessary offline dataset for ``Point Robot`` environment ([Download link](https://cloud.tsinghua.edu.cn/d/162d6fe92bde43e28676/)). Training FISOR in the ``Point Robot`` environment

``` Bash

python launcher/examples/train_offline.py --env_id 29 --config configs/train_config.py:fisor

```

Then visualize the feasible region by running [viz_map.py](https://github.com/ZhengYinan-AIR/FISOR/blob/master/launcher/viz/viz_map.py).







## Bibtex

If you find our code and paper can help, please cite our paper as:

```

@inproceedings{

zheng2024safe,

title={Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model},

author={Yinan Zheng and Jianxiong Li and Dongjie Yu and Yujie Yang and Shengbo Eben Li and Xianyuan Zhan and Jingjing Liu},

booktitle={The Twelfth International Conference on Learning Representations},

year={2024},

url={https://openreview.net/forum?id=j5JvZCaDM0}

}

```

## Acknowledgements

Parts of this code are adapted from [IDQL](https://github.com/philippe-eecs/IDQL) and [DRPO](https://github.com/ManUtdMoon/Distributional-Reachability-Policy-Optimization).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ZhengYinan-AIR/FISOR

Awesome Lists containing this project

README