{"id":17671446,"url":"https://github.com/lgvaz/rlbox","last_synced_at":"2026-02-28T05:55:22.632Z","repository":{"id":131907194,"uuid":"97338869","full_name":"lgvaz/rlbox","owner":"lgvaz","description":"RLbox: Solving OpenAI Gym with TensorFlow","archived":false,"fork":false,"pushed_at":"2018-04-19T18:43:10.000Z","size":2752,"stargazers_count":7,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-12T16:53:09.890Z","etag":null,"topics":["atari","continuous-control","deep-reinforcement-learning","deep-rl","dqn","mujoco","openai-gym","ppo","proximal-policy-optimization","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lgvaz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-07-15T19:52:02.000Z","updated_at":"2024-06-26T06:21:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"116087da-5793-4fc3-b12a-002c126ed055","html_url":"https://github.com/lgvaz/rlbox","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lgvaz/rlbox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgvaz%2Frlbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgvaz%2Frlbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgvaz%2Frlbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgvaz%2Frlbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lgvaz","download_url":"https://codeload.github.com/lgvaz/rlbox/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgvaz%2Frlbox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29925847,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-27T19:37:42.220Z","status":"online","status_checked_at":"2026-02-28T02:00:07.010Z","response_time":90,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atari","continuous-control","deep-reinforcement-learning","deep-rl","dqn","mujoco","openai-gym","ppo","proximal-policy-optimization","tensorflow"],"created_at":"2024-10-24T03:42:33.048Z","updated_at":"2026-02-28T05:55:22.615Z","avatar_url":"https://github.com/lgvaz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reinforcement Learning Box\n*Discountinued I'm now working on a [Pytorch RL library](https://github.com/lgvaz/torchrl)*  \nRLbox provides a framework for rapid experimentation with popular Deep Reinforcement Learning algorithms, it focus on making very easy to implement new ideias, which can be rapidly evaluated using [OpenAI Gym](https://github.com/openai/gym).  \n\n## Installation\n```bash\ngit clone https://github.com/apparatusbox/rlbox.git\ncd rlbox  \npip install -e .  \n```\n\n## How to use  \nExamples on how to run different agents can be found on the [examples](https://github.com/lgvaz/rlbox/tree/master/examples) folder.  \n\n## Implemented algorithms  \n### State of the art\n* [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) and extensions\n  * [Double Q-learning](https://arxiv.org/pdf/1509.06461.pdf)  \n  * [Dueling networks](https://arxiv.org/pdf/1511.06581.pdf)\n  * N-step learning\n  * Soft target update\n* [PPO](https://arxiv.org/pdf/1707.06347.pdf)\n  * Clipped Surrogate Objective  \n  * Adaptive KL Penalty Coefficient  \n  \n### Classical\n* Vanilla Policy Gradient\n* REINFORCE\n* Actor-Critic\n\n## Results  \n* __DQN on BreakoutNoFrameskip-v4__  \nEpisode 0 ---------------- Episode 3500 ----------- Episode 6000 ----------- Episode 7500 ----------- Episode 21500  \n![episode 0](assets/ep0_nolegend.gif)\n![episode 3500](assets/ep3500_nolegend.gif)\n![episode 6000](assets/ep6000_nolegend.gif)\n![episode 7500](assets/ep7500_nolegend.gif)\n![episode 21500](assets/ep21500_nolegend.gif)   \nMean reward after training: 421 (Averaged over 100 episodes)  \nDark blue: Standard DQN  \nLight blue: Double DQN  \n![Breakout reward](assets/breakout_plots.png)  \n\n* __PPO on Hopper-v1__ [Video](https://www.youtube.com/watch?v=QHAu8EWRJJ0\u0026feature=youtu.be)  \n![Hopper reward](assets/ppo_reward.png)  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flgvaz%2Frlbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flgvaz%2Frlbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flgvaz%2Frlbox/lists"}