{"id":15026636,"url":"https://github.com/khrylx/pytorch-rl","last_synced_at":"2025-04-12T19:50:47.412Z","repository":{"id":39737665,"uuid":"107290819","full_name":"Khrylx/PyTorch-RL","owner":"Khrylx","description":"PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.","archived":false,"fork":false,"pushed_at":"2021-02-09T16:17:59.000Z","size":31981,"stargazers_count":1184,"open_issues_count":14,"forks_count":191,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-04-04T00:07:46.242Z","etag":null,"topics":["a2c","deep-reinforcement-learning","fisher-vectors","generative-adversarial-network","policy-gradient","ppo","proximal-policy-optimization","pytorch","pytorch-rl","reinforcement-learning","trpo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Khrylx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-17T15:50:29.000Z","updated_at":"2025-04-03T00:31:23.000Z","dependencies_parsed_at":"2022-07-09T13:47:53.687Z","dependency_job_id":null,"html_url":"https://github.com/Khrylx/PyTorch-RL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khrylx%2FPyTorch-RL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khrylx%2FPyTorch-RL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khrylx%2FPyTorch-RL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Khrylx%2FPyTorch-RL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Khrylx","download_url":"https://codeload.github.com/Khrylx/PyTorch-RL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625501,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","deep-reinforcement-learning","fisher-vectors","generative-adversarial-network","policy-gradient","ppo","proximal-policy-optimization","pytorch","pytorch-rl","reinforcement-learning","trpo"],"created_at":"2024-09-24T20:04:49.304Z","updated_at":"2025-04-12T19:50:47.390Z","avatar_url":"https://github.com/Khrylx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyTorch implementation of reinforcement learning algorithms\nThis repository contains:\n1. policy gradient methods (TRPO, PPO, A2C)\n2. [Generative Adversarial Imitation Learning (GAIL)](https://arxiv.org/pdf/1606.03476.pdf)\n\n## Important notes\n- The code now works for PyTorch 0.4. For PyTorch 0.3, please check out the 0.3 branch.\n- To run mujoco environments, first install [mujoco-py](https://github.com/openai/mujoco-py) and [gym](https://github.com/openai/gym).\n- If you have a GPU, I recommend setting the OMP_NUM_THREADS to 1 (PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):\n```\nexport OMP_NUM_THREADS=1\n```\n\n## Features\n* Support discrete and continous action space.\n* Support multiprocessing for agent to collect samples in multiple environments simultaneously. (x8 faster than single thread)\n* Fast Fisher vector product calculation. For this part, Ankur kindly wrote a [blog](http://www.telesens.co/2018/06/09/efficiently-computing-the-fisher-vector-product-in-trpo/) explaining the implementation details.\n## Policy gradient methods\n* [Trust Region Policy Optimization (TRPO)](https://arxiv.org/pdf/1502.05477.pdf) -\u003e [examples/trpo_gym.py](https://github.com/Khrylx/PyTorch-RL/blob/master/examples/trpo_gym.py)\n* [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf) -\u003e [examples/ppo_gym.py](https://github.com/Khrylx/PyTorch-RL/blob/master/examples/ppo_gym.py)\n* [Synchronous A3C (A2C)](https://arxiv.org/pdf/1602.01783.pdf) -\u003e [examples/a2c_gym.py](https://github.com/Khrylx/PyTorch-RL/blob/master/examples/a2c_gym.py)\n\n### Example\n* python examples/ppo_gym.py --env-name Hopper-v2\n\n### Reference\n* [ikostrikov/pytorch-trpo](https://github.com/ikostrikov/pytorch-trpo)\n* [openai/baselines](https://github.com/openai/baselines)\n\n\n## Generative Adversarial Imitation Learning (GAIL)\n### To save trajectory\n* python gail/save_expert_traj.py --model-path assets/learned_models/Hopper-v2_ppo.p\n### To do imitation learning\n* python gail/gail_gym.py --env-name Hopper-v2 --expert-traj-path assets/expert_traj/Hopper-v2_expert_traj.p\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhrylx%2Fpytorch-rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkhrylx%2Fpytorch-rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhrylx%2Fpytorch-rl/lists"}