{"id":11563767,"url":"https://github.com/dgriff777/a3c_continuous","last_synced_at":"2025-10-03T14:31:07.616Z","repository":{"id":108171191,"uuid":"115178522","full_name":"dgriff777/a3c_continuous","owner":"dgriff777","description":"A continuous action space version of A3C LSTM in pytorch plus A3G design","archived":false,"fork":false,"pushed_at":"2024-04-19T21:55:11.000Z","size":65084,"stargazers_count":258,"open_issues_count":2,"forks_count":59,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-09-29T14:31:50.596Z","etag":null,"topics":["a3c","a3c-gpu","a3c-lstm","a3g","openai-gym","pytorch","pytorch-a3c"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dgriff777.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.MD","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-12-23T07:19:33.000Z","updated_at":"2024-09-15T09:43:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"6198cfd5-e303-4a9c-bdf3-2f851f865ba0","html_url":"https://github.com/dgriff777/a3c_continuous","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgriff777%2Fa3c_continuous","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgriff777%2Fa3c_continuous/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgriff777%2Fa3c_continuous/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dgriff777%2Fa3c_continuous/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dgriff777","download_url":"https://codeload.github.com/dgriff777/a3c_continuous/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235146406,"owners_count":18943257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a3c","a3c-gpu","a3c-lstm","a3g","openai-gym","pytorch","pytorch-a3c"],"created_at":"2024-06-23T05:59:48.591Z","updated_at":"2025-10-03T14:31:05.777Z","avatar_url":"https://github.com/dgriff777.png","language":"Python","funding_links":[],"categories":["时间序列"],"sub_categories":["网络服务_其他"],"readme":"*Update: Major update providing large training performance gains as well as code working with latest versions of pytorch and gym libraries. With updated code now possible to train a successful model that can avg 300+ on BipedalWalkerHardcore-v3 env in just 20-40mins using just CPU!!\n\n* A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!!\n*Training with A3G benefits training speed most when using larger models i.e using raw pixels for observations such as training in atari environments that have raw pixels for state representation*\n\n# RL A3C Pytorch Continuous\n\n![A3C LSTM playing BipedalWalkerHardcore-v3](https://github.com/dgriff777/a3c_continuous/blob/master/demo/BPHC.gif)\n\nThis repository includes my implementation with reinforcement learning using Asynchronous Advantage Actor-Critic (A3C) in Pytorch an algorithm from Google Deep Mind's paper \"Asynchronous Methods for Deep Reinforcement Learning.\"\n\n# A3G!!\nNew implementation of A3C that utilizes GPU for speed increase in training. Which we can call **A3G**. A3G as opposed to other versions that try to utilize GPU with A3C algorithm, with A3G each agent has its own network maintained on GPU but shared model is on CPU and agent models are quickly converted to CPU to update shared model which allows updates to be frequent and fast by utilizing Hogwild Training and make updates to shared model asynchronously and without locks. This new method greatly increase training speed and models and can be see in my [rl_a3c_pytorch][55] repo that training that use to take days to train can be trained in as fast as 10minutes for some Atari games!\n\n[55]: https://github.com/dgriff777/rl_a3c_pytorch\n\n### A3C LSTM\n\nThis is continuous domain version of my other a3c repo. Here I show A3C can solve BipedalWalker-v3 but also the much harder BipedalWalkerHardcore-v3 version as well. \"Solved\" meaning to train a model capable of averaging reward over 300 for 100 consecutive episodes\n\n## Requirements\n\n- Python 3.7+\n- openai gym==0.26.2\n- Pytorch\n- spdlog (Is a much faster logging library than the standard python logging library)\n- setproctitle\n\n## Training\n*When training model it is important to limit number of worker processes to number of cpu cores available as too many processes (e.g. more than one process per cpu core available) will actually be detrimental in training speed and effectiveness*\n\nTo train agent in BipedalWalker-v3 environment with 8 different worker processes:\n*On a MacPro 2014 laptop traing typically takes less than 5mins to converge to a winning solution*\n\n```\npython main.py --env BipedalWalker-v3 --optimizer Adam --shared-optimizer --workers 8 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger\n```\n\n![Graph of training run for BipedalWalker-v3](https://github.com/dgriff777/a3c_continuous/blob/master/demo/BW3_Rewards_graph.jpg)\nGraph showing training a BipedalWalker-v3 agent with the above command on Macbook pro. Train a successful model in 10mins on your laptop!\n\nTo tail training log for above command use the following command:\n```\ntail -f logs/BipedalWalker-v3_log\n```\n \nTo train agent in BipedalWalkerHardcore-v3 environment with 18 different worker processes:\n*BipedalWalkerHardcore-v3 is much harder environment compared to normal BipedalWalker*\n*Training a successful model than can achieve a 300+ avg reward on 100 episode test typical takes 20-40mins*\n\n```\npython main.py --env BipedalWalkerHardcore-v3 --optimizer Adam --shared-optimizer --workers 18 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger\n```\n\n![Graph of training run for BipedalWalkerHardcore-v3](https://github.com/dgriff777/a3c_continuous/blob/master/demo/BWH3_Rewards_graph.jpg)\nGraph showing training a BipedalWalkerHardcore-v3 agent with above command to train succesful model in under 30mins!\n\n\nTo tail training log for above command use the following command:\n```\ntail -f logs/BipedalWalkerHardcore-v3_log\n```\n\nHit Ctrl C to end training session properly\n\n## Evaluation\nTo run a 100 episode gym evaluation with trained model\n```\npython gym_eval.py --env BipedalWalkerHardcore-v3 --num-episodes 100\n```\n\n## Project Reference\n\n- https://github.com/ikostrikov/pytorch-a3c\n- https://github.com/andrewliao11/pytorch-a3c-mujoco\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgriff777%2Fa3c_continuous","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdgriff777%2Fa3c_continuous","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdgriff777%2Fa3c_continuous/lists"}