{"id":27948326,"url":"https://github.com/pku-alignment/safedreamer","last_synced_at":"2025-05-07T14:57:38.552Z","repository":{"id":217644005,"uuid":"744406105","full_name":"PKU-Alignment/SafeDreamer","owner":"PKU-Alignment","description":"ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models","archived":false,"fork":false,"pushed_at":"2024-04-08T02:08:21.000Z","size":2479,"stargazers_count":66,"open_issues_count":4,"forks_count":6,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-07T14:57:30.337Z","etag":null,"topics":["constraint-rl","constraint-satisfaction-problem","reinforcement-learning","safe-policy-optimization","safe-reinforcement-learning","safety-critical-systems"],"latest_commit_sha":null,"homepage":"https://sites.google.com/view/safedreamer","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PKU-Alignment.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-01-17T08:31:37.000Z","updated_at":"2025-05-02T11:08:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"73c83e79-df7b-486f-a926-954207a15f32","html_url":"https://github.com/PKU-Alignment/SafeDreamer","commit_stats":{"total_commits":2,"total_committers":1,"mean_commits":2.0,"dds":0.0,"last_synced_commit":"9e9e70f746a6d8e29671ac3e944adbaaa8fe9981"},"previous_names":["pku-alignment/safedreamer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2FSafeDreamer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2FSafeDreamer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2FSafeDreamer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2FSafeDreamer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PKU-Alignment","download_url":"https://codeload.github.com/PKU-Alignment/SafeDreamer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252902622,"owners_count":21822257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["constraint-rl","constraint-satisfaction-problem","reinforcement-learning","safe-policy-optimization","safe-reinforcement-learning","safety-critical-systems"],"created_at":"2025-05-07T14:57:37.907Z","updated_at":"2025-05-07T14:57:38.540Z","avatar_url":"https://github.com/PKU-Alignment.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e\n\u003ch1 align=\"center\"\u003e\u003cimg align=\"center\" width=\"6.5%\"\u003e\u003cstrong\u003eSafeDreamer: Safe Reinforcement Learning with World Models\n\u003c/strong\u003e\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href='https://github.com/hdadong/' target='_blank'\u003eWeidong Huang\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://jijiaming.com/' target='_blank'\u003eJiaming Ji\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?hl=zh-CN\u0026user=f1BzjccAAAAJ' target='_blank'\u003eChunhe Xia\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://github.com/muchvo' target='_blank'\u003eBorong Zhang\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://www.yangyaodong.com/' target='_blank'\u003eYaodong Yang\u003c/a\u003e\u0026emsp;\n    \u003cbr\u003e\n    Beihang University\u0026emsp;Peking University\n  \u003c/p\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2307.07176\" target='_blank'\u003e\n    \u003cimg src=\"https://img.shields.io/badge/arXiv-2307.07176-blue?\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://sites.google.com/view/safedreamer\" target='_blank'\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Website-\u0026#x1F680-green\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://sites.google.com/view/safedreamer\" target='_blank'\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Model Checkpoint-\u0026#x1F60-red\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\n## 🏠 About\nThe deployment of Reinforcement Learning (RL) in real-world applications is constrained by its failure to satisfy safety criteria. Existing Safe Reinforcement Learning (SafeRL) methods, which rely on cost functions to enforce safety, often fail to achieve zero-cost performance in complex scenarios, especially vision-only tasks. These limitations are primarily due to model inaccuracies and inadequate sample efficiency. The integration of world models has proven effective in mitigating these shortcomings. In this work, we introduce SafeDreamer, a novel algorithm incorporating Lagrangian-based methods into world model planning processes within the superior Dreamer framework. Our method achieves nearly zero-cost performance on various tasks, spanning low-dimensional and vision-only input, within the Safety-Gymnasium benchmark, showcasing its efficacy in balancing performance and safety in RL tasks. \n\u003c!-- ![Teaser](assets/teaser.jpg) --\u003e\n\u003cdiv style=\"text-align: center;\"\u003e\n    \u003cimg src=\"assets/architecture-min.png\" alt=\"Dialogue_Teaser\" width=100% \u003e\n\u003c/div\u003e\n\nWe have also open-sourced over **80+** [model checkpoints](https://huggingface.co/Weidong-Huang/SafeDreamer) for 20 tasks. Our codebase supports vector and vision observations. We hope this repository will become a valuable community resource for future research on model-based safe reinforcement learning.\n\n## 🔥 News\n- [2024-04] We have open-sourced the code and 80+ model checkpoints.\n- [2024-01] SafeDreamer has been accepted for ICLR 2024. \n\n## 🔍 Overview\n\n### Framework\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/brain-min.png\" align=\"center\" width=\"100%\"\u003e\n\u003c/p\u003e\nThe Architecture of SafeDreamer. (a) illustrates all components of SafeDreamer, which distinguishes costs as safety indicators from rewards and balances them using the Lagrangian method and a safe planner. The OSRP (b) and OSRP-Lag (c) variants execute online safety-reward planning (OSRP) within the world models for action generation. OSRP-Lag integrates online planning with the Lagrangian approach to balance long-term rewards and costs. The BSRP-Lag variant of SafeDreamer (d) employs background safety-reward planning (BSRP) via the Lagrangian method within the world models to update a safe actor. \n\n\n## 🔗 Citation\n\nIf you find our work helpful, please cite:\n\n```bibtex\n@inproceedings{\nsafedreamer,\ntitle={SafeDreamer: Safe Reinforcement Learning with World Models},\nauthor={Weidong Huang and Jiaming Ji and Borong Zhang and Chunhe Xia and Yaodong Yang},\nbooktitle={The Twelfth International Conference on Learning Representations},\nyear={2024},\nurl={https://openreview.net/forum?id=tsE5HLYtYg}\n}\n```\n\n## Instructions\n\n### Step0: Git clone\n```sh\ngit clone https://github.com/PKU-Alignment/SafeDreamer.git\ncd SafeDreamer\n```\n\n### Step1: Check version of CUDA and CUDNN (if use GPU)\nDue to the strong dependency of JAX on CUDA and cuDNN, it is essential to ensure that the versions are compatible to run the code successfully. Before installing JAX, it is recommended to carefully check the CUDA and cuDNN versions installed on your machine. Here are some methods we provide for checking the versions:\n\n1. Checking CUDA version:\n- Use the command `nvcc --version` in the terminal to check the installed CUDA version.\n\n2. Checking cuDNN version:\n- Check the version by examining the file names or metadata in the cuDNN installation directory 'cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2'.\n- Or you can also use torch to check the CUDNN version 'python3 -c 'import torch;cudnn_version = torch.backends.cudnn.version();print(f\"CUDNN Version: {cudnn_version}\");print(torch.version.cuda)'\n\nIt is crucial to ensure that the installed CUDA and cuDNN versions are compatible with the specific version of JAX you intend to install.\n### Step2: Install jax\nHere is some subjections for install jax, the new manipulation should be found in [jax](https://github.com/google/jax) documentation. we tested our code in the 0.3.25 version of jax.\n\n### \n```sh\nconda create -n example python=3.8\nconda activate example\npip install --upgrade pip\npip install jax==0.3.25\npip install jax-jumpy==1.0.0\n# for gpu\npip install jaxlib==0.3.25+cuda11.cudnn82 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n# for cpu\npip install jaxlib==0.3.25\n```\n\n### Step3: Install Other Dependencies\n```sh\npip install -r requirements.txt\n```\n\n### Step4: Install Safetygymnasium\n```sh\ngit clone https://github.com/PKU-Alignment/safety-gymnasium.git\ncd safety-gymnasium\npip install -e .\ncd ..\n```\n\n### Step5: Evaluation using Checkpoints\nYou can download the checkpoint from [Hugging Face](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main) and then run it locally without training from scratch. If you're looking to see if the code can run correctly, we recommend you download [the checkpoints of SafeDreamer(OSRP-Vector)](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main/safedreamer_osrp_vector), as it has a smaller size:\n\n\n\n|       Algorithm       | Size |  Checkpoint Link  |                                                                                                                                                                                                                | \n| :----------------: | :--: | :-------------: | ----------------------------------------------------------------------------------------------------------------------------- |\n|SafeDreamer(BSRP-Lag)| 392MB | [Hugging Face](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main/safedreamer_bsrplag) \n|SafeDreamer(OSRP-Lag)| 392MB | [Hugging Face](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main/safedreamer_osrplag) \n|SafeDreamer(OSRP)| 392MB | [Hugging Face](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main/safedreamer_osrp) \n|SafeDreamer(OSRP-Vector)| 26.6MB | [Hugging Face](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main/safedreamer_osrp_vector) \n|Unsafe-DreamerV3| 340MB | [Hugging Face](https://huggingface.co/Weidong-Huang/SafeDreamer/tree/main/unsafe_dreamerv3) \n\n```sh\n# Background Safety-Reward Planning with Lagrangian (BSRP-Lag):\npython SafeDreamer/train.py --configs bsrp_lag --method bsrp_lag --run.script eval_only --run.from_checkpoint /xxx/checkpoint.ckpt  --task safetygym_SafetyPointGoal1-v0 --jax.logical_gpus 0 --run.steps 10000\n\n# Online Safety-Reward Planning with Lagrangian (OSRP-Lag):\npython  SafeDreamer/train.py --configs osrp_lag --method osrp_lag --run.script eval_only --run.from_checkpoint /xxx/checkpoint.ckpt --task safetygym_SafetyPointGoal1-v0 --jax.logical_gpus 0 --run.steps 10000 --pid.init_penalty 0.1\n\n# Online Safety-Reward Planning (OSRP):\npython  SafeDreamer/train.py --configs osrp --method osrp --run.script eval_only --run.from_checkpoint /xxx/checkpoint.ckpt --task safetygym_SafetyPointGoal1-v0 --jax.logical_gpus 0 --run.steps 10000\n\n# Online Safety-Reward Planning (OSRP) for low-dimensional input:\npython  SafeDreamer/train.py --configs osrp_vector --method osrp --run.script eval_only --run.from_checkpoint /xxx/checkpoint.ckpt --task safetygymcoor_SafetyPointGoal1-v0 --jax.logical_gpus 0 --run.steps 10000\n\n```\n\nwhere checkpoint_path is '/xxx/xxx.ckpt'. If you use cpu, you should change the \"--jax.logical_gpus 0\" to \"--jax.platform cpu\".\n\n\n### Step6: Training from Scratch\n```sh\n# For cpu:\npython SafeDreamer/train.py --configs osrp --method osrp --task safetygym_SafetyPointGoal1-v0 --jax.platform cpu\n\n# For gpu:\n# Online Safety-Reward Planning (OSRP):\npython SafeDreamer/train.py --configs osrp --method osrp --task safetygym_SafetyPointGoal1-v0 --jax.logical_gpus 0\n\n# Online Safety-Reward Planning with Lagrangian (OSRP-Lag):\npython SafeDreamer/train.py --configs osrp_lag --method osrp_lag --task safetygym_SafetyPointGoal1-v0 --jax.logical_gpus 0\n\n# Background Safety-Reward Planning with Lagrangian (BSRP-Lag):\npython SafeDreamer/train.py --configs bsrp_lag --method bsrp_lag --task safetygym_SafetyPointGoal1-v0 --jax.logical_gpus 0\n\n# Online Safety-Reward Planning (OSRP) for low-dimensional input:\npython SafeDreamer/train.py --configs osrp_vector --method osrp_vector --task safetygymcoor_SafetyPointGoal1-v0 --jax.logical_gpus 0\n\n```\n\n## Tips\n\n- All configuration options are documented in `configs.yaml`, and you have the ability to override them through the command line.\n- If you encounter CUDA errors, it is recommended to scroll up through the error messages, as the root cause is often an issue that occurred earlier, such as running out of memory or having incompatible versions of JAX and CUDA.\n- To customize the GPU memory requirement, you can modify the `os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION']` variable in the `jaxagent.py`. This allows you to adjust the memory allocation according to your specific needs.\n\n\n## 📄 License\nSafeDreamer is released under Apache License 2.0.\n\n\n\n## 👏 Acknowledgements\n- [DreamerV3](https://github.com/danijar/dreamerv3): Our codebase is built upon DreamerV3.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpku-alignment%2Fsafedreamer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpku-alignment%2Fsafedreamer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpku-alignment%2Fsafedreamer/lists"}