{"id":18004578,"url":"https://github.com/liuzuxin/osrl","last_synced_at":"2025-12-14T02:02:03.499Z","repository":{"id":175456547,"uuid":"625073424","full_name":"liuzuxin/OSRL","owner":"liuzuxin","description":"🤖 Elegant implementations of offline safe RL algorithms in PyTorch","archived":false,"fork":false,"pushed_at":"2024-09-13T17:01:21.000Z","size":1508,"stargazers_count":196,"open_issues_count":1,"forks_count":13,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T10:34:51.683Z","etag":null,"topics":["bc-safe","cdt","cpq","library","offline-rl","offline-safe-rl","pytorch","reinforcement-learning","robotics","safe-rl"],"latest_commit_sha":null,"homepage":"https://offline-saferl.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/liuzuxin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-08T01:59:13.000Z","updated_at":"2025-03-29T07:12:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"747990eb-a47f-4165-af55-4e24d7f4e7a5","html_url":"https://github.com/liuzuxin/OSRL","commit_stats":null,"previous_names":["liuzuxin/osrl","liuzuxin/offline-safe-rl-baselines"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liuzuxin%2FOSRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liuzuxin%2FOSRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liuzuxin%2FOSRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liuzuxin%2FOSRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/liuzuxin","download_url":"https://codeload.github.com/liuzuxin/OSRL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247975774,"owners_count":21026937,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bc-safe","cdt","cpq","library","offline-rl","offline-safe-rl","pytorch","reinforcement-learning","robotics","safe-rl"],"created_at":"2024-10-30T00:14:53.786Z","updated_at":"2025-12-14T02:01:58.455Z","avatar_url":"https://github.com/liuzuxin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"http://www.offline-saferl.org\"\u003e\u003cimg width=\"300px\" height=\"auto\" src=\"https://github.com/liuzuxin/osrl/raw/main/docs/_static/images/osrl-logo.png\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\u003cbr/\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003ca\u003e![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-brightgreen.svg)\u003c/a\u003e\n  [![License](https://img.shields.io/badge/License-Apache-blue.svg)](#license)\n  [![PyPI](https://img.shields.io/pypi/v/osrl-lib?logo=pypi)](https://pypi.org/project/osrl-lib)\n  [![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/osrl?color=brightgreen\u0026logo=github)](https://github.com/liuzuxin/osrl/stargazers)\n  [![Downloads](https://static.pepy.tech/personalized-badge/osrl-lib?period=total\u0026left_color=grey\u0026right_color=blue\u0026left_text=downloads)](https://pepy.tech/project/osrl-lib)\n  \u003c!-- [![Documentation Status](https://img.shields.io/readthedocs/fsrl?logo=readthedocs)](https://fsrl.readthedocs.io) --\u003e\n  \u003c!-- [![CodeCov](https://codecov.io/github/liuzuxin/fsrl/branch/main/graph/badge.svg?token=BU27LTW9F3)](https://codecov.io/github/liuzuxin/fsrl)\n  [![Tests](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml/badge.svg)](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml) --\u003e\n  \u003c!-- [![CodeCov](https://img.shields.io/codecov/c/github/liuzuxin/fsrl/main?logo=codecov)](https://app.codecov.io/gh/liuzuxin/fsrl) --\u003e\n  \u003c!-- [![tests](https://img.shields.io/github/actions/workflow/status/liuzuxin/fsrl/test.yml?label=tests\u0026logo=github)](https://github.com/liuzuxin/fsrl/tree/HEAD/tests) --\u003e\n\n\u003c/div\u003e\n\n---\n\n**OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions. This repository is heavily inspired by the [CORL](https://github.com/corl-team/CORL) library for offline RL, check them out too!\n\nThe OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [DSRL](https://github.com/liuzuxin/DSRL) and [FSRL](https://github.com/liuzuxin/FSRL), and is built to facilitate the development of robust and reliable offline safe RL solutions.\n\nTo learn more, please visit our [project website](http://www.offline-saferl.org). If you find this code useful, please cite our paper, which has been accepted by the [DMLR journal](https://data.mlr.press/volumes/01.html):\n```bibtex\n@article{\n  liu2024offlinesaferl,\n  title={Datasets and Benchmarks for Offline Safe Reinforcement Learning},\n  author={Zuxin Liu and Zijian Guo and Haohong Lin and Yihang Yao and Jiacheng Zhu and Zhepeng Cen and Hanjiang Hu and Wenhao Yu and Tingnan Zhang and Jie Tan and Ding Zhao},\n  journal={Journal of Data-centric Machine Learning Research},\n  year={2024}\n}\n```\n\n## Structure\nThe structure of this repo is as follows:\n```\n├── examples\n│   ├── configs  # the training configs of each algorithm\n│   ├── eval     # the evaluation escipts\n│   ├── train    # the training scipts\n├── osrl\n│   ├── algorithms  # offline safe RL algorithms\n│   ├── common      # base networks and utils\n```\nThe implemented offline safe RL and imitation learning algorithms include:\n\n| Algorithm           | Type           | Description           |\n|:-------------------:|:-----------------:|:------------------------:|\n| BCQ-Lag             | Q-learning           | [BCQ](https://arxiv.org/pdf/1812.02900.pdf) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |\n| BEAR-Lag            | Q-learning           | [BEARL](https://arxiv.org/abs/1906.00949) with [PID Lagrangian](https://arxiv.org/abs/2007.03964)   |\n| CPQ                 | Q-learning           | [Constraints Penalized Q-learning (CPQ))](https://arxiv.org/abs/2107.09003) |\n| COptiDICE           | Distribution Correction Estimation           | [Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation](https://arxiv.org/abs/2204.08957) |\n| CDT                 | Sequential Modeling | [Constrained Decision Transformer](https://arxiv.org/abs/2302.07351) |\n| BC-All                 | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with all datasets |\n| BC-Safe                 | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with safe trajectories |\n| BC-Frontier                 | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with high-reward trajectories |\n\n\n## Installation\n\nOSRL is currently hosted on [PyPI](https://pypi.org/project/osrl-lib), you can simply install it by:\n\n```bash\npip install osrl-lib\n```\n\nYou can also pull the repo and install:\n```bash\ngit clone https://github.com/liuzuxin/OSRL.git\ncd osrl\npip install -e .\n```\n\nIf you want to use the `CDT` algorithm, please also manually install the `OApackage`:\n```bash\npip install OApackage==2.7.6\n```\n\n## How to use OSRL\n\nThe example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms. \nAll the parameters and their default configs for each algorithm are available in the `examples/configs` folder. \nOSRL uses the `WandbLogger` in [FSRL](https://github.com/liuzuxin/FSRL) and [Pyrallis](https://github.com/eladrich/pyrallis) configuration system. The offline dataset and offline environments are provided in [DSRL](https://github.com/liuzuxin/DSRL), so make sure you install both of them first.\n\n### Training\nFor example, to train the `bcql` method, simply run by overriding the default parameters:\n\n```shell\npython examples/train/train_bcql.py --task OfflineCarCircle-v0 --param1 args1 ...\n```\nBy default, the config file and the logs during training will be written to `logs\\` folder and the training plots can be viewed online using Wandb.\n\nYou can also launch a sequence of experiments or in parallel via the [EasyRunner](https://github.com/liuzuxin/easy-runner) package, see `examples/train_all_tasks.py` for details.\n\n### Evaluation\nTo evaluate a trained agent, for example, a BCQ agent, simply run\n```shell\npython examples/eval/eval_bcql.py --path path_to_model --eval_episodes 20\n```\nIt will load config file from `path_to_model/config.yaml` and model file from `path_to_model/checkpoints/model.pt`, run 20 episodes, and print the average normalized reward and cost. The pretrained checkpoints for all datasets are available [here](https://drive.google.com/drive/folders/1lZmw2NVNR4YGUdrkih9o3rTMDrWCI_jw?usp=sharing) for reference.\n\n## Acknowledgement\n\nThe framework design and most baseline implementations of OSRL are heavily inspired by the [CORL](https://github.com/corl-team/CORL) project, which is a great library for offline RL, and the [cleanrl](https://github.com/vwxyzjn/cleanrl) project, which targets online RL. So do check them out if you are interested!\n\n\n## Contributing\n\nIf you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community! \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliuzuxin%2Fosrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliuzuxin%2Fosrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliuzuxin%2Fosrl/lists"}