{"id":13706220,"url":"https://github.com/SvenGronauer/Bullet-Safety-Gym","last_synced_at":"2025-05-05T20:30:45.752Z","repository":{"id":41059647,"uuid":"343567536","full_name":"SvenGronauer/Bullet-Safety-Gym","owner":"SvenGronauer","description":"An open-source framework to benchmark and assess safety specifications of Reinforcement Learning problems.","archived":false,"fork":false,"pushed_at":"2023-07-07T10:10:40.000Z","size":14433,"stargazers_count":62,"open_issues_count":0,"forks_count":13,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-03T22:16:50.041Z","etag":null,"topics":["gym","gym-environments","open-source","pybullet","reinforcement-learning","safety"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SvenGronauer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-03-01T21:53:05.000Z","updated_at":"2024-06-25T20:47:58.000Z","dependencies_parsed_at":"2022-09-01T11:50:15.514Z","dependency_job_id":null,"html_url":"https://github.com/SvenGronauer/Bullet-Safety-Gym","commit_stats":{"total_commits":5,"total_committers":3,"mean_commits":"1.6666666666666667","dds":0.6,"last_synced_commit":"df7f9a22b52bad90df4fcf5769e9490ca46146ad"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SvenGronauer%2FBullet-Safety-Gym","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SvenGronauer%2FBullet-Safety-Gym/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SvenGronauer%2FBullet-Safety-Gym/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SvenGronauer%2FBullet-Safety-Gym/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SvenGronauer","download_url":"https://codeload.github.com/SvenGronauer/Bullet-Safety-Gym/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224465730,"owners_count":17315864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gym","gym-environments","open-source","pybullet","reinforcement-learning","safety"],"created_at":"2024-08-02T22:00:53.259Z","updated_at":"2024-11-13T14:30:42.385Z","avatar_url":"https://github.com/SvenGronauer.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["ICML 2023"],"readme":"**Status:** `Archive` (code is provided as-is). For actively maintained safe RL repos, please see: \n\u003ca href=\"https://github.com/PKU-Alignment/safety-gymnasium\"\u003eSafety Gymnasium\u003c/a\u003e and \n\u003ca href=\"https://github.com/utiasDSL/safe-control-gym\"\u003eSafe Control Gym\u003c/a\u003e\n\n\n# Bullet-Safety-Gym\n\u003cp align=\"center\"\u003e\n• \u003ca href=\"https://github.com/SvenGronauer/Bullet-Safety-Gym#technical-report-and-benchmark-results\"\u003eTechnical Report\u003c/a\u003e\n• \u003ca href=\"https://github.com/SvenGronauer/Bullet-Safety-Gym#installation\"\u003eInstallation\u003c/a\u003e \n• \u003ca href=\"https://github.com/SvenGronauer/Bullet-Safety-Gym#getting-started\"\u003eGetting Started\u003c/a\u003e\n•\n\u003c/p\u003e\n\n\"Bullet-Safety-Gym\" is a free and open-source framework to benchmark and assess \nsafety specifications in Reinforcement Learning (RL) problems. \n\n## Technical Report and Benchmark Results\n\nBaseline results and comparisons can be found in the [technical report](https://mediatum.ub.tum.de/1639974) [(PDF)](https://mediatum.ub.tum.de/doc/1639974/1639974.pdf)\n\nIf you like this repo, please use the following citation:\n\u003cpre\u003e\n@techreport{Gronauer2022BulletSafetyGym,\n\tauthor = {Gronauer, Sven},\n\tinstitution = {mediaTUM},\n\ttitle = {Bullet-Safety-Gym: A Framework for Constrained Reinforcement Learning},\n\tyear = {2022},\n\tdoi = {10.14459/2022md1639974},\n\tbdsk-url-1 = {https://mediatum.ub.tum.de/1639974}}\n\u003c/pre\u003e\n\n\n## Yet another Gym?\n\nBenchmarks are inevitable in order to assess the scientific progress. In recent \nyears, a plethora of benchmarks frameworks has emerged but we felt that the \nfollowing points have not been sufficiently answered yet:\n\n- **Increasing awareness of safety in AI**: The number of publications regarding\nsafety is rising. While deep RL is making big strides out of its infancy, a \nmajority of environments considers only reward maximization and have no explicit \nconnection to safety concepts.\n\n+ **Unified repository**: \nAlthough diverse safety environments have been used in recent works,\nrepositories that unify and standardize them into one framework are scarce, e.g.\nRay et al. 2019)\n\n+ **Strong Reproducibility**: Many state-of-the-art RL papers do not rely on \nopen-source software, which may slow down the pace of research compared to when \nexperiments were freely accessible to everyone. For a recent and hot discussion\nabout reproducibility, see the following ICLR review posted\n[on Reddit](\nhttps://www.reddit.com/r/MachineLearning/comments/jssmia/d_an_iclr_submission_is_given_a_clear_rejection/\n)\n\n## Implemented Agents\n\nBullet-Safety-Gym is shipped with the following four agents:\n\n+ **Ball**: A spherical shaped agent which can freely move on the xy-plane. \n+ **Car**: A four-wheeled agent based on MIT's Racecar.\n+ **Drone**: An air vehicle based on the AscTec Hummingbird quadrotor. \n+ **Ant**: A four-legged animal with a spherical torso.\n\n\nBall | Car | Drone | Ant\n--- | ---| ---| ---\n![Ball](./docs/figures/agent_ball.png) |![Car Agent](./docs/figures/agent_car.png)|![Drone Agent](./docs/figures/agent_drone.png)|![Ant Agent](./docs/figures/agent_ant.png)\n\n\n\n## Tasks\n\n+ **Circle**: \nAgents are expected to move on a circle in clock-wise direction (as proposed \nby Achiam et al. (2017)). The reward is dense and increases by the agent's \nvelocity and by the proximity towards the boundary of the circle. Costs are \nreceived when agent leaves the safety zone defined by the two yellow boundaries.\n\n+ **Gather**\nAgents are expected to navigate and collect as many green apples as possible\n while avoiding red bombs (Duan et al. 2016). In contrast to the other tasks, \n agents in the gather tasks receive only sparse rewards when reaching apples. \n Costs are also sparse and received when touching bombs (Achiam et al. 2017).\n\n+ **Reach**: Agents are supposed to move towards a goal (Ray et al. 2019). As \nsoon the agents enters the goal zone, the goal is re-spawned such that the agent\nhas to reach a series of goals. Obstacles are placed to hinder the agent from \ntrivial solutions. We implemented obstacles with a physical body, into which \nagents can collide and receive costs, and ones without collision shape that \nproduce costs for traversing. Rewards are dense and increase for moving closer \nto the goal and a sparse component is obtained when entering the goal zone. \n    \n\n\n+ **Run**: Agents are rewarded for running through an avenue between two \nsafety boundaries (Chow et al. 2019). The boundaries are non-physical \nbodies which can be penetrated without collision but provide costs. \nAdditional costs are received when exceeding an agent-specific velocity \nthreshold.\n\n\n\nCircle | Gather | Reach | Run\n--- | ---| ---| ---\n![Circle](./docs/figures/task_circle.png) |![Gather](./docs/figures/task_gather.png)|![Reach](./docs/figures/task_reach.png)|![Run](./docs/figures/task_run.png)\n\n\n\n# Installation\n\nHere are the (few) steps to follow to get our repository ready to run.\n\nClone the repository and install the Bullet-Safety-Gym package via pip. Use the \nfollowing three lines:\n\n```\ngit clone https://github.com/SvenGronauer/Bullet-Safety-Gym.git\n\ncd Bullet-Safety-Gym\n\npip install -e .\n```\n\n## Supported Systems\n\nWe currently support Linux and OS X running Python 3.5 or greater.\nWindows should also work (but has not been tested yet).\n\nNote: This package has been tested on Mac OS Mojave and Ubuntu (18.04 LTS, \n20.04 LTS), and is probably fine for most recent Mac and Linux operating \nsystems. \n\n## Dependencies \n\nBullet-Safety-Gym heavily depends on two packages:\n\n+ [Gym](https://github.com/openai/gym)\n+ [PyBullet](https://github.com/bulletphysics/bullet3)\n\n# Getting Started\n\nAfter the successful installation of the repository, the Bullet-Safety-Gym \nenvironments can be simply instantiated via `gym.make`. See: \n\n```\n\u003e\u003e\u003e import gym\n\u003e\u003e\u003e import bullet_safety_gym\n\u003e\u003e\u003e env = gym.make('SafetyCarGather-v0')\n```\n\nThe functional interface follows the API of the OpenAI Gym (Brockman et al., \n2016) that consists of the three following important functions:\n\n```\n\u003e\u003e\u003e observation = env.reset()\n\u003e\u003e\u003e random_action = env.action_space.sample()  # usually the action is determined by a policy\n\u003e\u003e\u003e next_observation, reward, done, info = env.step(random_action)\n```\n\nBesides the reward signal, our environments provide an additional cost signal, \nwhich is contained in the `info` dictionary:\n```\n\u003e\u003e\u003e info\n{'cost': 1.0}\n```\n\nA minimal code for visualizing a uniformly random policy in a GUI, can be seen \nin:\n\n```\nimport gym\nimport bullet_safety_gym\n\nenv = gym.make('SafetyAntCircle-v0')\n\nwhile True:\n    done = False\n    env.render()  # make GUI of PyBullet appear\n    x = env.reset()\n    while not done:\n        random_action = env.action_space.sample()\n        x, reward, done, info = env.step(random_action)\n```\nNote that only calling the render function before the reset function triggers \nvisuals.\n\n\n# List of environments\n\nThe environments are named in the following scheme:\n```Safety{#agent}{#task}-v0``` where the agent can be any of \n```{Ball, Car, Drone, Ant}``` and the task \ncan be of ```{Circle, Gather, Reach, Run,}```.\n\nThere exists also a function which returns all available environments of the \nBullet-Safety-Gym:\n```\nfrom bullet_safety_gym import get_bullet_safety_gym_env_list\n\nenv_list = get_bullet_safety_gym_env_list()\n```\n\n## Reach Environments\n\n+ SafetyBallReach-v0\n+ SafetyCarReach-v0\n+ SafetyDroneReach-v0\n+ SafetyAntReach-v0\n\n## Circle Run Environments\n+ SafetyBallCircle-v0\n+ SafetyCarCircle-v0\n+ SafetyDroneCircle-v0\n+ SafetyAntCircle-v0\n\n## Run Environments\n+ SafetyBallRun-v0\n+ SafetyCarRun-v0\n+ SafetyDroneRun-v0\n+ SafetyAntRun-v0\n\n## Gather Environments\n+ SafetyBallGather-v0\n+ SafetyCarGather-v0\n+ SafetyDroneGather-v0\n+ SafetyAntGather-v0\n\n\n# References\n\n+ Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Con- strained policy optimization. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 22–31, International Convention Centre, Sydney, Australia, 06– 11 Aug 2017. PMLR.\n\n+ G. Brockman, Vicki Cheung, Ludwig Pettersson, J. Schneider, John Schulman, Jie Tang, and W. Zaremba. Openai gym. ArXiv, abs/1606.01540, 2016.\n\n+ Yinlam Chow, Ofir Nachum, Aleksandra Faust, Mohammad Ghavamzadeh, and Edgar A. Due ́n ̃ez-Guzma ́n. Lyapunov-based safe policy optimization for continuous control. CoRR, abs/1901.10031, 2019.\n\n+ Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learn- ing, volume 48 of Proceedings of Machine Learning Research, pages 1329–1338, New York, New York, USA, 20–22 Jun 2016. PMLR.\n\n+ Alex Ray, Joshua Achiam, and Dario Amodei. Benchmarking Safe Exploration in Deep Reinforcement Learning. 2019.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSvenGronauer%2FBullet-Safety-Gym","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSvenGronauer%2FBullet-Safety-Gym","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSvenGronauer%2FBullet-Safety-Gym/lists"}