{"id":17120436,"url":"https://github.com/anjum48/rl-examples","last_synced_at":"2025-10-06T20:02:47.728Z","repository":{"id":154466890,"uuid":"110717978","full_name":"Anjum48/rl-examples","owner":"Anjum48","description":"Examples of published reinforcement learning algorithms in recent literature implemented in TensorFlow","archived":false,"fork":false,"pushed_at":"2020-08-03T09:42:11.000Z","size":17800,"stargazers_count":103,"open_issues_count":0,"forks_count":26,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-10-06T20:01:53.123Z","etag":null,"topics":["artificial-intelligence","openai-gym","python","reinforcement-learning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Anjum48.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-11-14T16:47:40.000Z","updated_at":"2025-04-21T16:51:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"20e5d111-6a07-4b8e-a3a7-61ac71ed02cd","html_url":"https://github.com/Anjum48/rl-examples","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Anjum48/rl-examples","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anjum48%2Frl-examples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anjum48%2Frl-examples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anjum48%2Frl-examples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anjum48%2Frl-examples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Anjum48","download_url":"https://codeload.github.com/Anjum48/rl-examples/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anjum48%2Frl-examples/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278671745,"owners_count":26025745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","openai-gym","python","reinforcement-learning","tensorflow"],"created_at":"2024-10-14T17:59:52.957Z","updated_at":"2025-10-06T20:02:47.723Z","avatar_url":"https://github.com/Anjum48.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# rl-examples\nExamples of published reinforcement learning algorithms in recent\nliterature implemented in TensorFlow.\nMost of my research is in the continuous domain, and I haven't spent much\ntime testing these in discrete domains such as Atari etc.\n\n![PPO LSTM solving BipedalWalker-v2](https://github.com/Anjum48/rl-examples/blob/master/ppo/BipedalWalker-v2.gif)\n![PPO solving CarRacing-v0](https://github.com/Anjum48/rl-examples/blob/master/ppo/CarRacing-v0.gif)\n\n*BipedalWalker-v2 solved using DPPO with a LSTM layer. CarRacing-v0 solved using PPO with a joined actor-critic network*\n\n## Algorithms Implemented\nThanks to DeepMind and OpenAI for making their research openly available.\nBig thanks also to the TensorFlow community.\n\n| Algorithm | Paper                                                   | \n| --------- | ------------------------------------------------------- |\n| DPPG      | [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)     |\n| A3C       | [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1602.01783)    |\n| PPO       | [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)                 |\n| DPPO      | [Emergence of Locomotion Behaviours in Rich Environments](https://arxiv.org/abs/1707.02286) |\n| GAE       | [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438) |\n\n\n- GAE was used in all algorithms except for DPPG\n- Where possible, I've added an LSTM layer to the policy and value functions.\nThis sometimes achieved higher scores in some environments, but can have stability issues\n- In some environments, having a joint network for the actor \u0026 critic performs better (i.e. where CNNs are used).\nThese scripts are suffixed, e.g. `ppo_joined.py`\n\n## Training\nAll the Python scripts are written as standalone scripts (but share some common functions in `utils.py`). \nJust run them directly in your IDE. Or in a terminal using the `-m` flag:\n\n```\nrl-examples$ python3 -m ppo.ppo_joined\n```\n\nThe models and TensorBoard summaries are saved in the same directory as the script.\nDPPO has a helper script to set off the worker threads:\n\n```\nrl-examples$ sh dppo/start_dppo.sh\n```\n\n## Requirements\n- Python 3.6+\n- OpenAI Gym 0.10.3+\n- TensorFlow 1.11\n- Numpy 1.13+\n\nDPPO was tested on a 16 core machine using CPU only, so the helper\nscript will need to be updated for your particular setup.\nFor my setup, there was usually no speed advantage training BipedalWalker on the\nCPU vs GPU (GTX 1080), but CarRacing did get a performance boost due to the usage of CNN layers\n\n## Issues/Todo's\n- Work needed to find the correct parameters for PPO in discrete action spaces for Atari\n- The LSTM batching in A3C is incorrect. Need to fix this (see `ppo_lstm.py` for the correct implementation)\n- Distributed Proximal Policy Optimisation with the LSTM (`dppo_lstm.py`) is sometimes a bit unstable,\nbut does work at low learning rates\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanjum48%2Frl-examples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanjum48%2Frl-examples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanjum48%2Frl-examples/lists"}