{"id":19736018,"url":"https://github.com/ikostrikov/pytorch-a3c","last_synced_at":"2025-05-16T11:03:49.901Z","repository":{"id":43910157,"uuid":"81783074","full_name":"ikostrikov/pytorch-a3c","owner":"ikostrikov","description":"PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from \"Asynchronous Methods for Deep Reinforcement Learning\".","archived":false,"fork":false,"pushed_at":"2019-09-25T18:08:56.000Z","size":210,"stargazers_count":1265,"open_issues_count":25,"forks_count":281,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-04-19T13:58:48.317Z","etag":null,"topics":["a3c","actor-critic","asynch","asynchronous-advantage-actor-critic","asynchronous-methods","deep-learning","deep-reinforcement-learning","python","pytorch","pytorch-a3c","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ikostrikov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-13T03:57:55.000Z","updated_at":"2025-04-17T16:35:21.000Z","dependencies_parsed_at":"2022-08-31T01:01:34.591Z","dependency_job_id":null,"html_url":"https://github.com/ikostrikov/pytorch-a3c","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikostrikov%2Fpytorch-a3c","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikostrikov%2Fpytorch-a3c/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikostrikov%2Fpytorch-a3c/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ikostrikov%2Fpytorch-a3c/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ikostrikov","download_url":"https://codeload.github.com/ikostrikov/pytorch-a3c/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254518384,"owners_count":22084374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a3c","actor-critic","asynch","asynchronous-advantage-actor-critic","asynchronous-methods","deep-learning","deep-reinforcement-learning","python","pytorch","pytorch-a3c","reinforcement-learning"],"created_at":"2024-11-12T01:04:46.854Z","updated_at":"2025-05-16T11:03:49.880Z","avatar_url":"https://github.com/ikostrikov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pytorch-a3c\n\nThis is a PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from [\"Asynchronous Methods for Deep Reinforcement Learning\"](https://arxiv.org/pdf/1602.01783v1.pdf).\n\nThis implementation is inspired by [Universe Starter Agent](https://github.com/openai/universe-starter-agent).\nIn contrast to the starter agent, it uses an optimizer with shared statistics as in the original paper.\n\nPlease use this bibtex if you want to cite this repository in your publications:\n\n    @misc{pytorchaaac,\n      author = {Kostrikov, Ilya},\n      title = {PyTorch Implementations of Asynchronous Advantage Actor Critic},\n      year = {2018},\n      publisher = {GitHub},\n      journal = {GitHub repository},\n      howpublished = {\\url{https://github.com/ikostrikov/pytorch-a3c}},\n    }\n\n## A2C\n\nI **highly recommend** to check a sychronous version and other algorithms: [pytorch-a2c-ppo-acktr](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr).\n\nIn my experience, A2C works better than A3C and ACKTR is better than both of them. Moreover, PPO is a great algorithm for continuous control. Thus, I recommend to try A2C/PPO/ACKTR first and use A3C only if you need it specifically for some reasons.\n\nAlso read [OpenAI blog](https://blog.openai.com/baselines-acktr-a2c/) for more information.\n\n## Contributions\n\nContributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.\n\n## Usage\n```bash\n# Works only wih Python 3.\npython3 main.py --env-name \"PongDeterministic-v4\" --num-processes 16\n```\n\nThis code runs evaluation in a separate thread in addition to 16 processes.\n\n## Results\n\nWith 16 processes it converges for PongDeterministic-v4 in 15 minutes.\n![PongDeterministic-v4](images/PongReward.png)\n\nFor BreakoutDeterministic-v4 it takes more than several hours.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fikostrikov%2Fpytorch-a3c","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fikostrikov%2Fpytorch-a3c","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fikostrikov%2Fpytorch-a3c/lists"}