{"id":22321765,"url":"https://github.com/chauncygu/safe-reinforcement-learning-baselines","last_synced_at":"2025-10-04T22:14:51.021Z","repository":{"id":42021449,"uuid":"462251155","full_name":"chauncygu/Safe-Reinforcement-Learning-Baselines","owner":"chauncygu","description":"The repository is for safe reinforcement learning baselines.","archived":false,"fork":false,"pushed_at":"2025-04-16T22:56:50.000Z","size":75615,"stargazers_count":644,"open_issues_count":0,"forks_count":86,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-05-22T12:05:33.755Z","etag":null,"topics":["baseline","reinforcement-learning","robotics","safe-reinforcement-learning","safe-robot-learning","safety"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chauncygu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-22T10:40:44.000Z","updated_at":"2025-05-21T15:17:19.000Z","dependencies_parsed_at":"2024-12-25T03:01:53.702Z","dependency_job_id":"6fa8e432-4aef-4883-b463-1b84c39ec231","html_url":"https://github.com/chauncygu/Safe-Reinforcement-Learning-Baselines","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chauncygu/Safe-Reinforcement-Learning-Baselines","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chauncygu%2FSafe-Reinforcement-Learning-Baselines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chauncygu%2FSafe-Reinforcement-Learning-Baselines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chauncygu%2FSafe-Reinforcement-Learning-Baselines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chauncygu%2FSafe-Reinforcement-Learning-Baselines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chauncygu","download_url":"https://codeload.github.com/chauncygu/Safe-Reinforcement-Learning-Baselines/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chauncygu%2FSafe-Reinforcement-Learning-Baselines/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278381894,"owners_count":25977449,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["baseline","reinforcement-learning","robotics","safe-reinforcement-learning","safe-robot-learning","safety"],"created_at":"2024-12-04T00:22:47.141Z","updated_at":"2025-10-04T22:14:51.012Z","avatar_url":"https://github.com/chauncygu.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Safe-Reinforcement-Learning-Baselines\n\n\n\n\n\nThe repository is for Safe Reinforcement Learning (RL) research, in which we investigate various safe RL baselines and safe RL benchmarks, including single agent RL and multi-agent RL. If any authors do not want their paper to be listed here, please feel free to contact \u003cgshangd[AT]foxmail.com\u003e. (This repository is under actively development. We appreciate any constructive comments and suggestions)\n\n\nYou are more than welcome to update this list! If you find a paper about Safe RL which is not listed here, please\n\n- fork this repository, add it and merge back;\n- or report an issue here;\n- or email \u003cgshangd[AT]foxmail.com\u003e.\n\n\n\n***\nThe README is organized as follows:\n- [Safe-Reinforcement-Learning-Baselines](#safe-reinforcement-learning-baselines)\n  * [1. Environments Supported](#1-environments-supported)\n    + [1.1. Safe Single Agent RL benchmarks](#11-safe-single-agent-rl-benchmarks)\n    + [1.2. Safe Multi-Agent RL benchmarks](#12-safe-multi-agent-rl-benchmarks)\n  * [2. Safe RL Baselines](#2-safe-rl-baselines)\n    + [2.1. Safe Single Agent RL Baselines](#21-safe-single-agent-rl-baselines)\n    + [2.2. Safe Multi-Agent RL Baselines](#22-safe-multi-agent-rl-baselines)\n  * [3. Surveys](#3-surveys)\n  * [4. Theses](#4-theses)\n  * [5. Book](#5-book)\n  * [6. Tutorials](#6-tutorials)\n  * [7. Exercise](#7-exercise)\n- [Publication](#publication)\n\n***\n\n\n\n### 1. Environments Supported\n#### 1.1. Safe Single Agent RL benchmarks\n- [AI Safety Gridworlds](https://github.com/deepmind/ai-safety-gridworlds)\n- [Safety-Gym](https://github.com/openai/safety-gym)\n- [Safety-Gymnasium](https://github.com/PKU-Alignment/safety-gymnasium)\n\n#### 1.2. Safe Multi-Agent RL benchmarks\n- [Safe Multi-Agent Mujoco](https://github.com/chauncygu/Safe-Multi-Agent-Mujoco)\n- [Safe Multi-Agent Isaac Gym](https://github.com/chauncygu/Safe-Multi-Agent-Isaac-Gym)\n- [Safe Multi-Agent Robosuite](https://github.com/chauncygu/Safe-Multi-Agent-Robosuite)\n\n\n\n### 2. Safe RL Baselines\n\n#### 2.1. Safe Single Agent RL Baselines\n\n- Consideration of risk in reinforcement learning, [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.8264\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by ICML 1994)\n- Multi-criteria Reinforcement Learning,  [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.232.962\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by ICML 1998)\n- Lyapunov design for safe reinforcement learning, [Paper](https://www.jmlr.org/papers/volume3/perkins02a/perkins02a.pdf), Not Find Code, (Accepted by ICML 2002)\n- Risk-sensitive reinforcement learning, [Paper](https://link.springer.com/content/pdf/10.1023/A:1017940631555.pdf), Not Find Code, (Accepted by Machine Learning, 2002)\n- Risk-Sensitive Reinforcement Learning Applied to Control under Constraints, [Paper](https://www.jair.org/index.php/jair/article/view/10415/24966), Not Find Code, (Accepted by Journal of Artificial Intelligence Research, 2005)\n- An actor-critic algorithm for constrained markov decision processes, [Paper](https://reader.elsevier.com/reader/sd/pii/S0167691104001276?token=D2FDE94E441EB4182DF4CF382458FCA57BDCABECB2E17932BF52CABA7F46F0F67EE5E9A4BE19F9FD3E27D4099CA25C80\u0026originRegion=eu-west-1\u0026originCreation=20220304073259), Not Find Code, (Accepted by Systems \u0026 Control Letters, 2005)\n- Reinforcement learning for MDPs with constraints, [Paper](https://link.springer.com/content/pdf/10.1007/11871842_63.pdf), Not Find Code, (Accepted by European Conference on Machine Learning 2006)\n- Discounted Markov decision processes with utility constraints, [Paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.140.1315\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by Computers \u0026 Mathematics with Applications, 2006)\n- Constrained reinforcement learning from intrinsic and extrinsic rewards, [Paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1059.1383\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by International Conference on Development and Learning 2007)\n- Safe exploration for reinforcement learning, [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.2786\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by ESANN 2008)\n- Percentile optimization for Markov decision processes with parameter uncertainty, [Paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.400.5048\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by Operations research, 2010)\n- Probabilistic goal Markov decision processes, [Paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.208.4804\u0026rep=rep1\u0026type=pdf), Not Find Code, (Accepted by IJCAI 2011)\n- Safe reinforcement learning in high-risk tasks through policy improvement, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=5967356), Not Find Code, (Accepted by IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) 2011) \n- Safe Exploration in Markov Decision Processes, [Paper](https://arxiv.org/pdf/1205.4810.pdf), Not Find Code, (Accepted by ICML 2012)\n- Policy gradients with variance related risk criteria, [Paper](https://arxiv.org/pdf/1206.6404.pdf), Not Find Code, (Accepted by ICML 2012)\n- Risk aversion in Markov decision processes via near optimal Chernoff bounds, [Paper](https://proceedings.neurips.cc/paper/2012/file/e2f374c3418c50bc30d67d5f7454a5b4-Paper.pdf), Not Find Code, (Accepted by NeurIPS 2012)\n- Safe Exploration of State and Action Spaces in Reinforcement Learning, [Paper](https://web.archive.org/web/20180423223542id_/http://www.jair.org/media/3761/live-3761-6687-jair.pdf), Not Find Code, (Accepted by Journal of Artificial Intelligence Research, 2012)\n- An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes, [Paper](https://link.springer.com/content/pdf/10.1007/s10957-012-9989-5.pdf), Not Find Code, (Accepted by Journal of Optimization Theory and Applications, 2012)\n- Safe policy iteration, [Paper](http://proceedings.mlr.press/v28/pirotta13.pdf), Not Find Code, (Accepted by ICML 2013)\n- Reachability-based safe learning with Gaussian processes, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=7039601), Not Find Code (Accepted by IEEE CDC 2014)\n- Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret, [Paper](https://arxiv.org/pdf/1505.05798.pdf), Not Find Code, (Accepted by ICML 2015)\n- High-Confidence Off-Policy Evaluation, [Paper](https://www.ics.uci.edu/~dechter/courses/ics-295/winter-2018/papers/2015Thomas2015.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safeRL) (Accepted by AAAI 2015)\n- Safe Exploration for Optimization with Gaussian Processes, [Paper](http://proceedings.mlr.press/v37/sui15.pdf), Not Find Code (Accepted by ICML 2015)\n- Safe Exploration in Finite Markov Decision Processes with Gaussian Processes, [Paper](https://proceedings.neurips.cc/paper/2016/file/9a49a25d845a483fae4be7e341368e36-Paper.pdf), Not Find Code (Accepted by NeurIPS 2016)\n- Safe and efficient off-policy reinforcement learning, [Paper](https://www.researchgate.net/profile/Anna-Harutyunyan-3/publication/303859091_Safe_and_Efficient_Off-Policy_Reinforcement_Learning/links/57b2e8c908aeb2cf17c73ad2/Safe-and-Efficient-Off-Policy-Reinforcement-Learning.pdf), [Code](https://github.com/ALRhub/Retrace-PyTorch) (Accepted by NeurIPS 2016)\n- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, [Paper](https://arxiv.org/pdf/1610.03295.pdf?ref=https://githubhelp.com), Not Find Code (only Arxiv, 2016, citation 530+)\n- Safe Learning of Regions of Attraction in Uncertain, Nonlinear Systems with Gaussian Processes, [Paper](https://arxiv.org/pdf/1603.04915.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safe_learning) (Accepetd by CDC 2016)\n- Safety-constrained reinforcement learning for MDPs, [Paper](https://www.researchgate.net/profile/Nils-Jansen-2/publication/283118102_Safety-Constrained_Reinforcement_Learning_for_MDPs/links/5630d2af08aef3349c29f90f/Safety-Constrained-Reinforcement-Learning-for-MDPs.pdf), Not Find Code (Accepted by InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems 2016)\n- Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=7526658), Not Find Code (Accepted by American Control Conference 2016)\n- Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear, [Paper](https://openreview.net/pdf?id=r1tHvHKge), Not Find Code (only Openreview, 2016)\n- Combating reinforcement learning's sisyphean curse with intrinsic fear, [Paper](https://arxiv.org/pdf/1611.01211.pdf), Not Find Code (only Arxiv, 2016)\n- Constrained Policy Optimization (CPO), [Paper](http://proceedings.mlr.press/v70/achiam17a/achiam17a.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safety-starter-agents) (Accepted by ICML 2017)\n- Risk-constrained reinforcement learning with percentile risk criteria, [Paper](https://www.jmlr.org/papers/volume18/15-636/15-636.pdf), , Not Find Code (Accepted by The Journal of Machine Learning Research, 2017)\n- Probabilistically Safe Policy Transfer, [Paper](https://arxiv.org/pdf/1705.05394.pdf),  Not Find Code (Accepted by ICRA 2017) \n- Accelerated primal-dual policy optimization for safe reinforcement learning, [Paper](https://arxiv.org/pdf/1802.06480.pdf), Not Find Code (Arxiv, 2017)\n- Stagewise safe bayesian optimization with gaussian processes, [Paper](http://www.yisongyue.com/publications/icml2018_stageopt.pdf),  Not Find Code (Accepted by ICML 2018)\n- Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning, [Paper](https://arxiv.org/pdf/1711.06782.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/LeaveNoTrace) (Accepted by ICLR 2018)\n- Safe Model-based Reinforcement Learning with Stability Guarantees, [Paper](https://proceedings.neurips.cc/paper/2017/file/766ebcd59621e305170616ba3d3dac32-Paper.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safe_learning) (Accepted by NeurIPS 2018)\n- A Lyapunov-based Approach to Safe Reinforcement Learning, [Paper](https://proceedings.neurips.cc/paper/2018/file/4fe5149039b52765bde64beb9f674940-Paper.pdf), Not Find Code (Accepted by NeurIPS 2018)\n- Constrained Cross-Entropy Method for Safe Reinforcement Learning, [Paper](https://proceedings.neurips.cc/paper/2018/file/34ffeb359a192eb8174b6854643cc046-Paper.pdf), Not Find Code (Accepted by NeurIPS 2018)\n- Safe Reinforcement Learning via Formal Methods, [Paper](http://www.cs.cmu.edu/~aplatzer/pub/SafeRL.pdf), Not Find Code (Accepted by AAAI 2018)\n- Safe exploration and optimization of constrained mdps using gaussian processes, [Paper](http://www.yisongyue.com/publications/aaai2018_safe_mdp.pdf), Not Find Code (Accepted by AAAI 2018)\n- Safe reinforcement learning via shielding, [Paper](https://arxiv.org/pdf/1708.08611.pdf), [Code](https://github.com/safe-rl/safe-rl-shielding) (Accepted by AAAI 2018)\n- Trial without Error: Towards Safe Reinforcement Learning via Human Intervention, [Paper](https://www.ifaamas.org/Proceedings/aamas2018/pdfs/p2067.pdf), Not Find Code (Accepted by AAMAS 2018)\n- Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning, [Paper](https://arxiv.org/pdf/1906.12189.pdf), Not Find Code (Accepted by CDC 2018)\n- The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems, [Paper](http://proceedings.mlr.press/v87/richards18a/richards18a.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safe_learning) (Accepted by CoRL 2018)\n- OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World, [Paper](https://arxiv.org/pdf/1709.07643.pdf), Not Find Code (Accepted by ICRA 2018)\n- Safe learning of quadrotor dynamics using barrier certificates, [Paper](https://ieeexplore.ieee.org/iel7/8449910/8460178/08460471.pdf), Not Find Code (Accepted by ICRA 2018)\n- Safe reinforcement learning on autonomous vehicles, [Paper](https://arxiv.org/pdf/1910.00399.pdf), Not Find Code (Accepted by IROS 2018)\n- Trial without error: Towards safe reinforcement learning via human intervention, [Paper](https://arxiv.org/pdf/1707.05173.pdf), [Code](https://github.com/gsastry/human-rl) (Accepted by AAMAS 2018)\n- Safe reinforcement learning: Learning with supervision using a constraint-admissible set, [Paper](https://ieeexplore.ieee.org/abstract/document/8430770), Not Find Code (Accepted by Annual American Control Conference (ACC) 2018)\n- A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=8493361), Not Find Code (Accepted by IEEE Transactions on Automatic Control 2018)\n- Safe exploration algorithms for reinforcement learning controllers, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7842559), Not Find Code (Accepted by IEEE transactions on neural networks and learning systems 2018)\n- Verification and repair of control policies for safe reinforcement learning, [Paper](https://link.springer.com/content/pdf/10.1007/s10489-017-0999-8.pdf), Not Find Code (Accepted by Applied Intelligence, 2018)\n- Safe Exploration in Continuous Action Spaces, [Paper](https://www.researchgate.net/profile/Gal-Dalal/publication/322756278_Safe_Exploration_in_Continuous_Action_Spaces/links/5a71e84faca2720bc0d940b3/Safe-Exploration-in-Continuous-Action-Spaces.pdf), [Code](https://github.com/AgrawalAmey/safe-explorer), (only Arxiv, 2018, citation 200+)\n- Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning, [Paper](https://www.researchgate.net/profile/Kim-Wabersich/publication/329641554_Safe_exploration_of_nonlinear_dynamical_systems_A_predictive_safety_filter_for_reinforcement_learning/links/5ede2aab299bf1d20bd87981/Safe-exploration-of-nonlinear-dynamical-systems-A-predictive-safety-filter-for-reinforcement-learning.pdf), Not Find Code (Arxiv, 2018, citation 40+)\n- Batch policy learning under constraints, [Paper](http://proceedings.mlr.press/v97/le19a/le19a.pdf), [Code](https://github.com/clvoloshin/constrained_batch_policy_learning) (Accepted by ICML 2019)\n- Safe Policy Improvement with Baseline Bootstrapping, [Paper](https://www.researchgate.net/profile/Romain-Laroche/publication/334749134_Safe_Policy_Improvement_with_Baseline_Bootstrapping/links/5d3f3b634585153e592ceeb4/Safe-Policy-Improvement-with-Baseline-Bootstrapping.pdf), Not Find Code (Accepted by ICML 2019)\n- Convergent Policy Optimization for Safe Reinforcement Learning, [Paper](https://proceedings.neurips.cc/paper/2019/file/db29450c3f5e97f97846693611f98c15-Paper.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/Safe_reinforcement_learning) (Accepted by NeurIPS 2019)\n- Constrained reinforcement learning has zero duality gap, [Paper](https://www.researchgate.net/profile/Luiz-Chamon/publication/336889860_Constrained_Reinforcement_Learning_Has_Zero_Duality_Gap/links/5ef4df204585155050726b42/Constrained-Reinforcement-Learning-Has-Zero-Duality-Gap.pdf), Not Find Code (Accepted by NeurIPS 2019)\n- Reinforcement learning with convex constraints, [Paper](https://www.cs.princeton.edu/~syoosefi/papers/NeurIPS2019.pdf), [Code](https://github.com/xkianteb/ApproPO) (Accepted by NeurIPS 2019)\n- Reward constrained policy optimization, [Paper](https://arxiv.org/pdf/1805.11074.pdf), Not Find Code (Accepted by ICLR 2019)\n- Supervised policy update for deep reinforcement learning, [Paper](https://arxiv.org/pdf/1805.11706.pdf), [Code](https://github.com/quanvuong/Supervised_Policy_Update), (Accepted by ICLR 2019)\n- End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks, [Paper](https://arxiv.org/pdf/1903.08792.pdf), [Code](https://github.com/rcheng805/RL-CBF) (Accepted by AAAI 2019)\n- Lyapunov-based safe policy optimization for continuous control, [Paper](https://openreview.net/pdf?id=SJgUYBVLsN), Not Find Code (Accepted by ICML Workshop RL4RealLife 2019)\n- Safe reinforcement learning with model uncertainty estimates, [Paper](https://arxiv.org/pdf/1810.08700.pdf), Not Find Code (Accepted by ICRA 2019)\n- Safe reinforcement learning with scene decomposition for navigating complex urban environments, [Paper](https://arxiv.org/pdf/1904.11483.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/AutomotiveSafeRL), (Accepted by IV 2019)\n- Verifiably safe off-model reinforcement learning, [Paper](https://link.springer.com/chapter/10.1007/978-3-030-17462-0_28), [Code](https://github.com/IBM/vsrl-framework/blob/42e0853bffb5efbb66cd97178aff9e10ad18c5a9/README.md) (Accepted by  InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems 2019)\n- Probabilistic policy reuse for safe reinforcement learning, [Paper](https://dl.acm.org/doi/pdf/10.1145/3310090?casa_token=OahWDUpVTxAAAAAA:MVJd1GjD6HDpFKMxXfp9pd3KaJbG879P7qvcMS0-VDGFAR0prYuXwzN9LwI4BfkPti085CGGhsz1llY), Not Find Code, (Accepted by ACM Transactions on Autonomous and Adaptive Systems (TAAS), 2019)\n- Projected stochastic primal-dual method for constrained online learning with kernels, [Paper](https://ieeexplore.ieee.org/ielaam/78/8691646/8678800-aam.pdf), Not Find Code, (Accepted by IEEE Transactions on Signal Processing, 2019)\n- Resource constrained deep reinforcement learning, [Paper](https://arxiv.org/pdf/1812.00600.pdf), Not Find Code, (Accepted by 29th International Conference on Automated Planning and Scheduling  2019)\n- Temporal logic guided safe reinforcement learning using control barrier functions, [Paper](https://arxiv.org/pdf/1903.09885.pdf), Not Find Code (Arxiv, Citation 25+, 2019)\n- Safe policies for reinforcement learning via primal-dual methods, [Paper](https://www.researchgate.net/profile/Luiz-Chamon/publication/337438444_Safe_Policies_for_Reinforcement_Learning_via_Primal-Dual_Methods/links/5ef4df1f299bf18816e7f62c/Safe-Policies-for-Reinforcement-Learning-via-Primal-Dual-Methods.pdf), Not Find Code (Arxiv, Citation 25+, 2019)\n- Value constrained model-free continuous control, [Paper](https://arxiv.org/pdf/1902.04623.pdf), Not Find Code (Arxiv, Citation 35+, 2019)\n- Safe Reinforcement Learning in Constrained Markov Decision Processes (SNO-MDP), [Paper](http://proceedings.mlr.press/v119/wachi20a/wachi20a.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safe_near_optimal_mdp) (Accepted by ICML 2020)\n- Responsive Safety in Reinforcement Learning by PID Lagrangian Methods, [Paper](http://proceedings.mlr.press/v119/stooke20a/stooke20a.pdf), [Code](https://github.com/keirp/glamor/tree/98681a23bae9e8e5e9fbf68a0316ca2a22a27593/dependencies/rlpyt/rlpyt/projects/safe) (Accepted by ICML 2020)\n- Constrained markov decision processes via backward value functions, [Paper](http://proceedings.mlr.press/v119/satija20a/satija20a.pdf), [Code](https://github.com/hercky/cmdps_via_bvf/tree/69b9f51cb6410673d0aa2e5b9c980b33e5a46dda) (Accepted by ICML 2020)\n- Projection-Based Constrained Policy Optimization (PCPO), [Paper](https://arxiv.org/pdf/2010.03152.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/PCPO) (Accepted by ICLR 2020)\n- First order constrained optimization in policy space (FOCOPS),[Paper](https://proceedings.neurips.cc/paper/2020/file/af5d5ef24881f3c3049a7b9bfe74d58b-Paper.pdf), [Code](https://github.com/ymzhang01/focops) (Accepted by NeurIPS 2020)\n- Safe reinforcement learning via curriculum induction, [Paper](https://proceedings.neurips.cc/paper/2020/file/8df6a65941e4c9da40a4fb899de65c55-Paper.pdf), [Code](https://github.com/zuzuba/CISR_NeurIPS20) (Accepted by NeurIPS 2020)\n- Constrained episodic reinforcement learning in concave-convex and knapsack settings, [Paper](https://arxiv.org/pdf/2006.05051.pdf), [Code](https://github.com/miryoosefi/ConRL) (Accepted by NeurIPS 2020)\n- Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret, [Paper](https://arxiv.org/pdf/2006.13827.pdf), Not Find Code  (Accepted by NeurIPS 2020)\n- Upper confidence primal-dual reinforcement learning for CMDP with adversarial loss, [Paper](https://proceedings.neurips.cc/paper_files/paper/2020/file/ae95296e27d7f695f891cd26b4f37078-Paper.pdf), Not Find Code  (Accepted by NeurIPS 2020)\n- IPO: Interior-point Policy Optimization under Constraints, [Paper](https://www.researchgate.net/profile/Yongshuai-Liu/publication/336735393_IPO_Interior-point_Policy_Optimization_under_Constraints/links/5e1670874585159aa4bff037/IPO-Interior-point-Policy-Optimization-under-Constraints.pdf), Not Find Code (Accepted by AAAI 2020)\n- Safe reinforcement learning using robust mpc, [Paper](https://arxiv.org/pdf/1906.04005.pdf), Not Find Code (IEEE Transactions on Automatic Control, 2020)\n- Safe reinforcement learning via projection on a safe set: How to achieve optimality? [Paper](https://arxiv.org/pdf/2004.00915.pdf), Not Find Code (Accepted by IFAC 2020)\n- Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions, [Paper](http://www.roboticsproceedings.org/rss16/p088.pdf), Not Find Code (Accepted by RSS 2020)\n- Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning, [Paper](https://arxiv.org/pdf/1909.04307.pdf), [Code](https://github.com/GKthom/Priors-for-safe-exploration), (Accepted by International Joint Conference on Neural Networks (IJCNN) 2020)\n- Safe reinforcement learning through meta-learned instincts, [Paper](https://arxiv.org/pdf/2005.03233.pdf), Not Find Code (Accepted by The Conference on Artificial Life 2020)\n- Learning safe policies with cost-sensitive advantage estimation, [Paper](https://openreview.net/pdf?id=uVnhiRaW3J), Not Find Code (Openreview 2020)\n- Safe reinforcement learning using probabilistic shields, [Paper](https://repository.ubn.ru.nl/bitstream/handle/2066/224966/224966.pdf?sequence=1), Not Find Code (2020)\n- A constrained reinforcement learning based approach for network slicing, [Paper](https://icnp20.cs.ucr.edu/proceedings/hdrnets/A%20Constrained%20Reinforcement%20Learning%20Based%20Approach%20for%20Network%20Slicing.pdf),  Not Find Code (Accepted by IEEE 28th International Conference on Network Protocols (ICNP) 2020)\n- Safe reinforcement learning: A control barrier function optimization approach, [Paper](https://onlinelibrary.wiley.com/doi/epdf/10.1002/rnc.5132), Not Find Code (Accepted by the International Journal of Robust and Nonlinear Control)\n- Exploration-exploitation in constrained mdps, [Paper](https://arxiv.org/pdf/2003.02189.pdf), Not Find Code (Arxiv, 2020)\n- Safe reinforcement learning using advantage-based intervention, [Paper](http://proceedings.mlr.press/v139/wagener21a/wagener21a.pdf), [Code](https://github.com/nolanwagener/safe_rl) (Accepted by ICML 2021)\n- Shortest-path constrained reinforcement learning for sparse reward tasks, [Paper](https://arxiv.org/pdf/2107.06405.pdf), [Code](https://github.com/srsohn/shortest-path-rl), (Accepted by ICML 2021)\n- Density constrained reinforcement learning, [Paper](https://arxiv.org/pdf/2106.12764.pdf), Not Find Code (Accepted by ICML 2021)\n- CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee, [Paper](https://arxiv.org/pdf/2011.05869.pdf), Not Find Code (Accepted by ICML 2021)\n- Safe reinforcement learning with linear function approximation, [Paper](https://proceedings.mlr.press/v139/amani21a/amani21a.pdf), Not Find Code (Accepted by ICML 2021)\n- Safe Reinforcement Learning by Imagining the Near Future (SMBPO), [Paper](https://proceedings.neurips.cc/paper/2021/file/73b277c11266681122132d024f53a75b-Paper.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/Safe-MBPO) (Accepted by NeurIPS 2021) \n- Towards safe reinforcement learning with a safety editor policy, [Paper](https://arxiv.org/pdf/2201.12427.pdf), [Code](https://github.com/hnyu/seditor) (Accepted by NeurIPS 2021)\n- Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning, [Paper](https://arxiv.org/pdf/2111.03947.pdf),  Not Find Code (Accepted by NeurIPS 2021)\n- Risk-Sensitive Reinforcement Learning: Symmetry, Asymmetry, and Risk-Sample Tradeoff, [Paper](https://arxiv.org/pdf/2111.03947.pdf),  Not Find Code (Accepted by NeurIPS 2021)\n- Safe reinforcement learning with natural language constraints, [Paper](https://proceedings.neurips.cc/paper/2021/file/72f67e70f6b7cdc4cc893edaddf0c4c6-Paper.pdf), [Code](https://github.com/princeton-nlp/SRL-NLC), (Accepted by NeurIPS 2021)\n- Learning policies with zero or bounded constraint violation for constrained mdps, [Paper](https://arxiv.org/pdf/2106.02684.pdf),  Not Find Code (Accepted by NeurIPS 2021)\n-  Conservative safety critics for exploration, [Paper](https://arxiv.org/pdf/2010.14497.pdf), Not Find Code (Accepted by ICLR 2021)\n-  Wcsac: Worst-case soft actor critic for safety-constrained reinforcement learning, [Paper](https://www.st.ewi.tudelft.nl/mtjspaan/pub/Yang21aaai.pdf), Not Find Code (Accepted by AAAI 2021)\n-  Risk-averse trust region optimization for reward-volatility reduction, [Paper](https://arxiv.org/pdf/1912.03193.pdf), Not Find Code (Accepted by IJCAI 2021)\n- AlwaysSafe: Reinforcement Learning Without Safety Constraint Violations During Training, [Paper](https://pure.tudelft.nl/ws/files/96913978/p1226.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/AlwaysSafe) (Accepted by AAMAS 2021)\n- Safe Continuous Control with Constrained Model-Based Policy Optimization (CMBPO), [Paper](https://arxiv.org/pdf/2104.06922.pdf), [Code](https://github.com/anyboby/Constrained-Model-Based-Policy-Optimization) (Accepted by IROS 2021)\n- Context-aware safe reinforcement learning for non-stationary environments, [Paper](https://arxiv.org/pdf/2101.00531.pdf), [Code](https://github.com/baimingc/casrl) (Accepted by ICRA 2021)\n- Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9636468), [Code](https://github.com/mahaitongdae/safe_exp_env) (Accepted by IROS 2021)\n- Robot Reinforcement Learning on the Constraint Manifold, [Paper](https://proceedings.mlr.press/v164/liu22c/liu22c.pdf), [Code](https://github.com/PuzeLiu/rl_on_manifold) (Accepted by CoRL 2021)\n- Provably efficient safe exploration via primal-dual policy optimization, [Paper](https://arxiv.org/pdf/2003.00534.pdf), Not Find Code (Accepted by the International Conference on Artificial Intelligence and Statistics 2021)\n- Safe model-based reinforcement learning with robust cross-entropy method, [Paper](https://aisecure-workshop.github.io/aml-iclr2021/papers/8.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/safe-mbrl) (Accepted by ICLR 2021 Workshop on Security and Safety in Machine Learning Systems)\n- MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance, [Paper](https://arxiv.org/pdf/2112.03575.pdf), [Code](https://github.com/michaelzhiluo/mesa-safe-rl) (Accepted by Workshop on Safe and Robust Control of Uncertain Systems at NeurIPS 2021)\n- Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks, [Paper](http://proceedings.mlr.press/v144/zheng21a/zheng21a.pdf), [Code](https://github.com/chauncygu/Safe-Reinforcement-Learning-Baseline/tree/main/Safe-RL/vertex-net) (Accepted by Conference on Learning for Dynamics and Control 2021)\n- Can You Trust Your Autonomous Car? Interpretable and Verifiably Safe Reinforcement Learning, [Paper](http://download.cmutschler.de/publications/2021/IV2021.pdf), Not Find Code (Accepted by IV 2021)\n- Provably safe model-based meta reinforcement learning: An abstraction-based approach, [Paper](https://arxiv.org/pdf/2109.01255.pdf), Not Find Code (Accepted by CDC 2021)\n- Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones, [Paper](https://www.researchgate.net/profile/Minho-Hwang/publication/345152769_Recovery_RL_Safe_Reinforcement_Learning_with_Learned_Recovery_Zones/links/5fe37ea2299bf140883a35cb/Recovery-RL-Safe-Reinforcement-Learning-with-Learned-Recovery-Zones.pdf), [Code](https://github.com/abalakrishna123/recovery-rl), (Accepted by IEEE RAL, 2021)\n- Reinforcement learning control of constrained dynamic systems with uniformly ultimate boundedness stability guarantee, [Paper](https://www.sciencedirect.com/science/article/pii/S0005109821002090), Not Find Code (Accepted by Automatica, 2021)\n- A predictive safety filter for learning-based control of constrained nonlinear dynamical systems, [Paper](https://arxiv.org/pdf/1812.05506.pdf), Not Find Code (Accepted by Automatica, 2021)\n- A simple reward-free approach to constrained reinforcement learning, [Paper](https://www.cs.princeton.edu/~syoosefi/papers/reward-free2021.pdf),  Not Find Code (Arxiv, 2021)\n- State augmented constrained reinforcement learning: Overcoming the limitations of learning with rewards, [Paper](https://arxiv.org/pdf/2102.11941.pdf),  Not Find Code (Arxiv, 2021)\n- DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention, [Paper](https://arxiv.org/pdf/2110.14468.pdf),  Not Find Code (Arxiv, 2021)\n- Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions, [Paper](https://arxiv.org/pdf/2104.08171.pdf), Not Find Code (Arxiv, 2021)\n- Constrained Variational Policy Optimization for Safe Reinforcement Learning, [Paper](https://arxiv.org/pdf/2201.11927.pdf), [Code](https://github.com/liuzuxin/cvpo-safe-rl) (ICML 2022)\n- Provably efficient model-free constrained rl with linear function approximation, [Paper](https://arxiv.org/pdf/2206.11889), Not Find Code (NeurIPS 2022)\n- Constrained Policy Optimization via Bayesian World Models, [Paper](https://arxiv.org/pdf/2201.09802), [Code](https://github.com/yardenas/la-mbda) (ICLR 2022)\n- Stability-Constrained Markov Decision Processes Using MPC, [Paper](https://arxiv.org/pdf/2102.01383.pdf), Not Find Code (Accepted by Automatica, 2022)\n- Constrained Reinforcement Learning for Vehicle Motion Planning with Topological Reachability Analysis, [Paper](https://www.mdpi.com/2218-6581/11/4/81/pdf), Not Find Code (Accepted by Robotics, 2022)\n- Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation, [Paper](https://proceedings.mlr.press/v151/wei22a/wei22a.pdf), [Code](https://github.com/honghaow/Triple-q) (Accepted by AISTATS 2022)\n- Safe reinforcement learning using robust action governor, [Paper](https://arxiv.org/pdf/2102.10643.pdf), Not Find Code (Accepted by In Learning for Dynamics and Control, 2022)\n- A primal-dual approach to constrained markov decision processes, [Paper](https://arxiv.org/pdf/2101.10895.pdf),  Not Find Code (Arxiv, 2022)\n- SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation, [Paper](https://arxiv.org/pdf/2202.06558.pdf), Not Find Code (Arxiv, 2022)\n- Finding Safe Zones of policies Markov Decision Processes, [Paper](https://arxiv.org/pdf/2202.11593.pdf), Not Find Code (Arxiv, 2022)\n- CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning, [Paper](https://arxiv.org/pdf/2202.07565.pdf), [Code](https://github.com/RL-boxes/Safe-RL) (Arxiv, 2022)\n- SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition, [Paper](https://arxiv.org/pdf/2202.04849.pdf), Not Find Code (Arxiv, 2022)\n- Penalized Proximal Policy Optimization for Safe Reinforcement Learning, [Paper](https://arxiv.org/pdf/2205.11814.pdf), Not Find Code (Arxiv, 2022)\n- Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning, [Paper](https://arxiv.org/pdf/2206.07376.pdf), Not Find Code (Arxiv, 2022)\n- Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs, [Paper](https://arxiv.org/pdf/2206.02346.pdf), Not Find Code (Arxiv, 2022)\n- Guided Safe Shooting: model based reinforcement learning with safety constraints, [Paper](https://arxiv.org/pdf/2206.09743.pdf), Not Find Code (Arxiv, 2022)\n- Safe Reinforcement Learning via Confidence-Based Filters, [Paper](https://arxiv.org/pdf/2207.01337.pdf), Not Find Code (Arxiv, 2022)\n- TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning, [Paper](https://ieeexplore.ieee.org/document/9677982), [Code](https://github.com/rllab-snu/Trust-Region-CVaR) (Accepted by IEEE RAL, 2022)\n- Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk, [Paper](https://ieeexplore.ieee.org/document/9802647), Not Find Code (Accepted by IEEE RAL, 2022)\n- Enhancing Safe Exploration Using Safety State Augmentation, [Paper](https://arxiv.org/pdf/2206.02675), Not Find Code (Arxiv, 2022)\n- Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk, [Paper](https://arxiv.org/pdf/2206.04436.pdf), Not Find Code (Accepted by IJCAI 2022)\n- Safe reinforcement learning of dynamic high-dimensional robotic tasks: navigation, manipulation, interaction, [Paper](https://arxiv.org/pdf/2209.13308.pdf), Not Find Code (Arxiv, 2022)\n- Safe Exploration Method for Reinforcement Learning under Existence of Disturbance, [Paper](https://arxiv.org/pdf/2209.15452.pdf), Not Find Code (Arxiv, 2022)\n- Guiding Safe Exploration with Weakest Preconditions, [Paper](https://arxiv.org/pdf/2209.14148.pdf), [Code](https://github.com/gavlegoat/spice) (Arxiv, 2022)\n- Temporal logic guided safe model-based reinforcement learning: A hybrid systems approach, [Paper](https://www.sciencedirect.com/science/article/pii/S1751570X22000905), Not Find Code (Accepted by Nonlinear Analysis: Hybrid Systems, 2022)\n- Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes, [Paper](https://arxiv.org/pdf/2210.10691.pdf),  Not Find Code (Arxiv, 2022)\n- Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm, [Paper](https://arxiv.org/pdf/2210.07573.pdf),  [Code](https://github.com/akjayant/mbppol) (Arxiv, 2022)\n- Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate, [Paper](https://arxiv.org/pdf/2210.07553.pdf), Not Find Code (Arxiv, 2022)\n- UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning, [Paper](https://arxiv.org/pdf/2210.14030.pdf), Not Find Code (Arxiv, 2022)\n- Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments, [Paper](https://arxiv.org/pdf/2209.15090.pdf),  Not Find Code (Arxiv, 2022)\n- Safe Reinforcement Learning Using Robust Control Barrier Functions, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\u0026arnumber=9928337), Not Find Code (Accepted by IEEE RAL, 2022)\n- Model-free Neural Lyapunov Control for Safe Robot Navigation, [Paper](https://arxiv.org/pdf/2203.01190.pdf), [Code](https://github.com/ZikangXiong/MFNLC), [Demo](https://sites.google.com/view/mf-nlc) (Accepted by IROS 2022)\n- Safe Reinforcement Learning via Probabilistic Logic Shields, [Paper](https://www.ijcai.org/proceedings/2023/0637.pdf), [Code](https://github.com/wenchiyang/pls) (Accepted by IJCAI 2023, Distinguished Paper Award)\n- Towards robust and safe reinforcement learning with benign off-policy data, [Paper](https://proceedings.mlr.press/v202/liu23l/liu23l.pdf),  Not Find Code (Accepted by ICML 2023)\n- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments, [Paper](https://proceedings.mlr.press/v202/wang23as/wang23as.pdf),  Not Find Code (Accepted by ICML 2023)\n- Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL, [Paper](https://arxiv.org/pdf/2206.14057),  Not Find Code (Accepted by ICLR 2023)\n- A CMDP-within-online framework for Meta-Safe Reinforcement Learning, [Paper](https://openreview.net/pdf?id=mbxz9Cjehr),  Not Find Code (Accepted by ICLR 2023)\n- SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization, [Paper](https://arxiv.org/pdf/2311.00880), [Code](https://github.com/SafeRL-Lab/SCPO) (Arxiv, 2023)\n- Shielded Reinforcement Learning for Hybrid Systems, [Paper](https://link.springer.com/chapter/10.1007/978-3-031-46002-9_3) [(Arxiv)](https://arxiv.org/abs/2308.14424), [Code](https://github.com/AsgerHB/Shielded-Learning-for-Hybrid-Systems) (AISOLA, 2023)\n- Adaptive primal-dual method for safe reinforcement learning, [Paper](https://arxiv.org/pdf/2402.00355), Not Find Code (Accepted by AAMAS 2024)\n- Probabilistic constraint for safety-critical reinforcement learning, [Paper](https://ieeexplore.ieee.org/iel7/9/4601496/10475493.pdf), Not Find Code (Accepted by TAC)\n- Generalized constraint for probabilistic safe reinforcement learning, [Paper](https://proceedings.mlr.press/v242/chen24b/chen24b.pdf), Not Find Code (Accepted by DCC 2024)\n- Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning, [Paper](https://www.jmlr.org/papers/volume25/22-0878/22-0878.pdf), [Code](https://github.com/Ilnura/LB_SGD) (JMLR, 2024)\n- Provably safe reinforcement learning with step-wise violation constraints, [Paper](https://arxiv.org/pdf/2302.06064), Not Find Code (Accepted by NeurIPS 2024)\n- Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation, [Paper](https://ojs.aaai.org/index.php/AAAI/article/view/30102/31944), Not Find Code (Accepted by AAAI 2024)\n- Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models, [Paper](https://arxiv.org/pdf/2401.07553v1), Not Find Code (Accepted by AAMAS 2024)\n- Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation, [Paper](https://arxiv.org/pdf/2405.20860), Not Find Code (Arxiv, 2024)\n- Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning, [Paper](https://arxiv.org/pdf/2405.16390), Not Find Code (Arxiv, 2024)\n- Confident Natural Policy Gradient for Local Planning in qπ-realizable Constrained MDPs, [Paper](https://arxiv.org/pdf/2406.18529), Not Find Code (Arxiv, 2024)\n- Safe Exploration Using Bayesian World Models and Log-Barrier Optimization, [Paper](https://arxiv.org/pdf/2405.05890), [Code](https://anonymous.4open.science/r/safe-opax-F5FF/README.md) (Arxiv, 2024)\n- Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning, [Paper](https://arxiv.org/pdf/2405.16390), [Code](https://github.com/SafeRL-Lab/CMORL) (Accepted by IEEE TPAMI 2025)\n\n\n\n\n\n#### 2.2. Safe Multi-Agent RL Baselines\n- Multi-Agent Constrained Policy Optimisation (MACPO), [Paper](https://arxiv.org/pdf/2110.02793.pdf), [Code](https://github.com/chauncygu/Multi-Agent-Constrained-Policy-Optimisation) (Arxiv, 2021)\n- MAPPO-Lagrangian, [Paper](https://arxiv.org/pdf/2110.02793.pdf), [Code](https://github.com/chauncygu/Multi-Agent-Constrained-Policy-Optimisation)  (Arxiv, 2021)\n- Decentralized policy gradient descent ascent for safe multi-agent reinforcement learning, [Paper](https://chentianyi1991.github.io/aaai.pdf), Not Find Code (Accepted by AAAI 2021)\n- Safe multi-agent reinforcement learning via shielding, [Paper](https://arxiv.org/pdf/2101.11196.pdf), Not Find Code (Accepted by AAMAS 2021)\n- CMIX: Deep Multi-agent Reinforcement Learning with Peak and Average Constraints, [Paper](https://2021.ecmlpkdd.org/wp-content/uploads/2021/07/sub_181.pdf), Not Find Code (Accepted by Joint European Conference on Machine Learning and Knowledge Discovery in Databases 2021)\n- Safe multi-agent reinforcement learning through decentralized multiple control barrier functions, [Paper](https://arxiv.org/pdf/2103.12553.pdf), Not Find Code (Arxiv 2021)\n- CAMA: A New Framework for Safe Multi-Agent Reinforcement Learning Using Constraint Augmentation, [Paper](https://openreview.net/pdf?id=jK02XX9ZpJkt), Not Find Code (Openreview 2022)\n- Shield decentralization for safe multi-agent reinforcement learning, [Paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/57444e14ecd9e2c8f603b4f012ce3811-Paper-Conference.pdf), Not Find Code (NeurIPS 2022)\n\n\n### 3. Surveys\n- A Review of Safe Reinforcement Learning: Methods, Theory and Applications, [Paper](https://arxiv.org/pdf/2205.10330.pdf) (IEEE TPAMI, 2024)\n- State-wise Safe Reinforcement Learning: A Survey, [Paper](https://arxiv.org/pdf/2302.03122.pdf) (Accepted by IJCAI 2023)\n- Policy learning with constraints in model-free reinforcement learning: A survey, [Paper](https://web.archive.org/web/20210812230501id_/https://www.ijcai.org/proceedings/2021/0614.pdf) (Accepted by IJCAI 2021)\n- Safe learning in robotics: From learning-based control to safe reinforcement learning, [Paper](https://arxiv.org/pdf/2108.06266.pdf) (Accepted by Annual Review of Control, Robotics, and Autonomous Systems, 2021)\n- Safe learning and optimization techniques: Towards a survey of the state of the art, [Paper](https://arxiv.org/pdf/2101.09505.pdf) (Accepted by In International Workshop on the Foundations of Trustworthy AI Integrating Learning, Optimization and Reasoning, 2020)\n- A comprehensive survey on safe reinforcement learning, [Paper](https://www.jmlr.org/papers/volume16/garcia15a/garcia15a.pdf) (Accepted by Journal of Machine Learning Research, 2015)\n\n\n\n\n### 4. Theses\n- Safe Reinforcement Learning to Make Decisions in Robotics, [Thesis](https://people.eecs.berkeley.edu/~shangding.gu/papers/PhD_Dissertation_Shangding_Gu_2024.pdf) (PhD thesis, Shangding Gu, TU Munich, 2024)\n- Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics, [Thesis](https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/370833/1/root.pdf) (PhD thesis, Felix Berkenkamp, ETH Zurich, 2019)\n- Safe reinforcement learning, [Thesis](https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1527\u0026context=dissertations_2) (PhD thesis, Philip S. Thomas, University of Massachusetts Amherst, 2015)\n\n\n\n\n### 5. Book\n- Constrained Markov decision processes: stochastic modeling, [Book](https://www-sop.inria.fr/members/Eitan.Altman/PAPERS/h.pdf), (Eitan Altman, Routledge, 1999)\n\n### 6. Tutorials\n- Safe Reinforcement Learning: Bridging Theory and Practice, [tutorial](https://docs.google.com/presentation/d/1slZyKj1G_XvtH8laWMClcQVMLbiQyqKW25cV9gY3ypE/edit?usp=sharing), (Ming Jin \u0026 Shangding Gu, 2024)\n- Safe Reinforcement Learning for Smart Grid Control and Operations, [tutorial](https://docs.google.com/presentation/d/1o3t3KMfgCL5fo_zHZH2ChMkJTkbJ7sY7lMomBE8iRNE/edit?usp=sharing), (Ming Jin \u0026 Shangding Gu, 2024)\n- Safe Reinforcement Learning, [tutorial](https://drive.google.com/file/d/1Hpu9HZbXkurTMWvj63m-aLYxay66E2Vz/view), (Felix Berkenkamp, 2023)\n- Primal-Dual Methods, [tutorial](https://drive.google.com/file/d/1_NRil0__6375nIqMT6jXw-PB6CkwvvDH/view), (Gergely Neu, 2023)\n\n### 7. Exercise\n- Primal-Dual Reinforcement Learning, [exercise code](https://github.com/tyrion/primal-dual-exercise) and [exercise Colab](https://colab.research.google.com/github/tyrion/primal-dual-exercise/blob/master/Primal_Dual_Colab.ipynb), (Germano Gabbianelli, 2023)\n\n\n## Publication\nIf you find the repository useful, please cite the [paper](https://arxiv.org/abs/2205.10330):\n```\n@article{gu2024review,\n  title={A Review of Safe Reinforcement Learning: Methods, Theories and Applications},\n  author={Gu, Shangding and Yang, Long and Du, Yali and Chen, Guang and Walter, Florian and Wang, Jun and Knoll, Alois},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n  year={2024},\n  publisher={IEEE}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchauncygu%2Fsafe-reinforcement-learning-baselines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchauncygu%2Fsafe-reinforcement-learning-baselines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchauncygu%2Fsafe-reinforcement-learning-baselines/lists"}