{"id":27595957,"url":"https://github.com/chagmgang/distributed_reinforcement_learning","last_synced_at":"2025-10-05T20:09:37.064Z","repository":{"id":95666860,"uuid":"253718191","full_name":"chagmgang/distributed_reinforcement_learning","owner":"chagmgang","description":"implementation of distributed reinforcement learning with distributed tensorflow","archived":false,"fork":false,"pushed_at":"2021-06-05T07:04:21.000Z","size":120,"stargazers_count":56,"open_issues_count":0,"forks_count":13,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-22T12:37:40.354Z","etag":null,"topics":["apex","distributed-reinforcement-learning","distributed-rl","distributed-tensorflow","impala","r2d2","reinforcement-learning","scalable-reinforcement-learning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chagmgang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-04-07T07:24:28.000Z","updated_at":"2025-03-21T16:09:19.000Z","dependencies_parsed_at":"2023-03-24T03:04:36.265Z","dependency_job_id":null,"html_url":"https://github.com/chagmgang/distributed_reinforcement_learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chagmgang/distributed_reinforcement_learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chagmgang%2Fdistributed_reinforcement_learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chagmgang%2Fdistributed_reinforcement_learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chagmgang%2Fdistributed_reinforcement_learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chagmgang%2Fdistributed_reinforcement_learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chagmgang","download_url":"https://codeload.github.com/chagmgang/distributed_reinforcement_learning/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chagmgang%2Fdistributed_reinforcement_learning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278510931,"owners_count":25999005,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apex","distributed-reinforcement-learning","distributed-rl","distributed-tensorflow","impala","r2d2","reinforcement-learning","scalable-reinforcement-learning","tensorflow"],"created_at":"2025-04-22T12:37:35.972Z","updated_at":"2025-10-05T20:09:37.051Z","avatar_url":"https://github.com/chagmgang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Implementation of Distributed Reinforcement Learning with Tensorflow\n\n## Information\n\n* 20 actors with 1 learner.\n* Tensorflow implementation with `distributed tensorflow` of server-client architecture.\n* `Recurrent Experience Replay in Distributed Reinforcement Learning` is implemented in Breakout-Deterministic-v4 with POMDP(Observation not provided with 20% probability)\n\n## Dependency\n```\nopencv-python\ngym[atari]\ntensorboardX\ntensorflow==1.14.0\n```\n\n\n## Implementation\n\n- [x] [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783.pdf)\n- [x] [IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/abs/1802.01561)\n- [x] [DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY](https://arxiv.org/abs/1803.00933)\n- [x] [Recurrent Experience Replay in Distributed Reinforcement Learning](https://openreview.net/forum?id=r1lyTjAqYX)\n\n## How to Run\n\n* A3C: Asynchronous Methods for Deep Reinforcement Learning\n```\nCUDA_VISIBLE_DEVICES=-1 python train_a3c.py --job_name --job_name actor --task 0\n\nCUDA_VISIBLE_DEVICES=-1 python train_a3c.py --job_name --job_name actor --task 0\nCUDA_VISIBLE_DEVICES=-1 python train_a3c.py --job_name --job_name actor --task 1\nCUDA_VISIBLE_DEVICES=-1 python train_a3c.py --job_name --job_name actor --task 2\n...\nCUDA_VISIBLE_DEVICES=-1 python train_a3c.py --job_name --job_name actor --task 19\n```\n\n* Ape-x: DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY\n```\npython train_apex.py --job_name learner --task 0\n\nCUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 0\nCUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 1\nCUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 2\n...\nCUDA_VISIBLE_DEVICES=-1 python train_apex.py --job_name actor --task 19\n```\n\n* IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures\n```\npython train_impala.py --job_name learner --task 0\n\nCUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 0\nCUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 1\nCUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 2\n...\nCUDA_VISIBLE_DEVICES=-1 python train_impala.py --job_name actor --task 19\n```\n\n* R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning\n```\npython train_r2d2.py --job_name learner --task 0\n\nCUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 0\nCUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 1\nCUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 2\n...\nCUDA_VISIBLE_DEVICES=-1 python train_r2d2.py --job_name actor --task 39\n```\n\n# Reference\n\n1. [IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/abs/1802.01561)\n2. [DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY](https://arxiv.org/abs/1803.00933)\n3. [Recurrent Experience Replay in Distributed Reinforcement Learning](https://openreview.net/forum?id=r1lyTjAqYX)\n4. [deepmind/scalable_agent](https://github.com/deepmind/scalable_agent)\n5. [google-research/seed-rl](https://github.com/google-research/seed_rl)\n6. [Asynchronous_Advatnage_Actor_Critic](https://github.com/alphastarkor/distributed_tensorflow_a3c)\n7. [Relational_Deep_Reinforcement_Learning](https://github.com/RLOpensource/Relational_Deep_Reinforcement_Learning)\n8. [Deep Recurrent Q-Learning for Partially Observable MDPs](https://arxiv.org/abs/1507.06527)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchagmgang%2Fdistributed_reinforcement_learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchagmgang%2Fdistributed_reinforcement_learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchagmgang%2Fdistributed_reinforcement_learning/lists"}