{"id":13678850,"url":"https://github.com/dongminlee94/deep_rl","last_synced_at":"2025-04-05T18:09:44.843Z","repository":{"id":39068925,"uuid":"208984427","full_name":"dongminlee94/deep_rl","owner":"dongminlee94","description":"PyTorch implementation of deep reinforcement learning algorithms","archived":false,"fork":false,"pushed_at":"2021-11-19T14:22:50.000Z","size":31693,"stargazers_count":496,"open_issues_count":2,"forks_count":59,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-03-29T17:12:06.576Z","etag":null,"topics":["a2c","ddpg","ddqn","deep-reinforcement-learning","dqn","model-free-rl","npg","ppo","pytorch","sac","sac-aea","td3","trpo","vpg"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dongminlee94.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-17T07:12:40.000Z","updated_at":"2025-03-21T16:08:46.000Z","dependencies_parsed_at":"2022-07-11T05:16:35.538Z","dependency_job_id":null,"html_url":"https://github.com/dongminlee94/deep_rl","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dongminlee94%2Fdeep_rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dongminlee94%2Fdeep_rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dongminlee94%2Fdeep_rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dongminlee94%2Fdeep_rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dongminlee94","download_url":"https://codeload.github.com/dongminlee94/deep_rl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247378149,"owners_count":20929297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","ddpg","ddqn","deep-reinforcement-learning","dqn","model-free-rl","npg","ppo","pytorch","sac","sac-aea","td3","trpo","vpg"],"created_at":"2024-08-02T13:00:59.040Z","updated_at":"2025-04-05T18:09:44.819Z","avatar_url":"https://github.com/dongminlee94.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Deep Reinforcement Learning (DRL) Algorithms with PyTorch\n\nThis repository contains PyTorch implementations of deep reinforcement learning algorithms. **The repository will soon be updated including the PyBullet environments!**\n\n## Algorithms Implemented\n\n1. Deep Q-Network (DQN) \u003csub\u003e\u003csup\u003e ([V. Mnih et al. 2015](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n2. Double DQN (DDQN) \u003csub\u003e\u003csup\u003e ([H. Van Hasselt et al. 2015](https://arxiv.org/abs/1509.06461)) \u003c/sup\u003e\u003c/sub\u003e\n3. Advantage Actor Critic (A2C)\n4. Vanilla Policy Gradient (VPG)\n5. Natural Policy Gradient (NPG) \u003csub\u003e\u003csup\u003e ([S. Kakade et al. 2002](http://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n6. Trust Region Policy Optimization (TRPO) \u003csub\u003e\u003csup\u003e ([J. Schulman et al. 2015](https://arxiv.org/abs/1502.05477)) \u003c/sup\u003e\u003c/sub\u003e\n7. Proximal Policy Optimization (PPO) \u003csub\u003e\u003csup\u003e ([J. Schulman et al. 2017](https://arxiv.org/abs/1707.06347)) \u003c/sup\u003e\u003c/sub\u003e\n8. Deep Deterministic Policy Gradient (DDPG) \u003csub\u003e\u003csup\u003e ([T. Lillicrap et al. 2015](https://arxiv.org/abs/1509.02971)) \u003c/sup\u003e\u003c/sub\u003e\n9. Twin Delayed DDPG (TD3) \u003csub\u003e\u003csup\u003e ([S. Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477)) \u003c/sup\u003e\u003c/sub\u003e\n10. Soft Actor-Critic (SAC) \u003csub\u003e\u003csup\u003e ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1801.01290)) \u003c/sup\u003e\u003c/sub\u003e\n11. SAC with automatic entropy adjustment (SAC-AEA) \u003csub\u003e\u003csup\u003e ([T. Haarnoja et al. 2018](https://arxiv.org/abs/1812.05905)) \u003c/sup\u003e\u003c/sub\u003e\n\n## Environments Implemented\n\n1. Classic control environments (CartPole-v1, Pendulum-v0, etc.) \u003csub\u003e\u003csup\u003e (as described in [here](https://gym.openai.com/envs/#classic_control)) \u003c/sup\u003e\u003c/sub\u003e\n2. MuJoCo environments (Hopper-v2, HalfCheetah-v2, Ant-v2, Humanoid-v2, etc.) \u003csub\u003e\u003csup\u003e (as described in [here](https://gym.openai.com/envs/#mujoco)) \u003c/sup\u003e\u003c/sub\u003e\n3. **PyBullet environments (HopperBulletEnv-v0, HalfCheetahBulletEnv-v0, AntBulletEnv-v0, HumanoidDeepMimicWalkBulletEnv-v1 etc.)** \u003csub\u003e\u003csup\u003e (as described in [here](https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs)) \u003c/sup\u003e\u003c/sub\u003e\n\n## Results (MuJoCo, PyBullet)\n\n### MuJoCo environments\n\n#### Hopper-v2\n\n- Observation space: 8\n- Action space: 3\n\n#### HalfCheetah-v2\n\n- Observation space: 17\n- Action space: 6\n\n#### Ant-v2\n\n- Observation space: 111\n- Action space: 8\n\n#### Humanoid-v2\n\n- Observation space: 376\n- Action space: 17\n\n### PyBullet environments\n\n#### HopperBulletEnv-v0\n\n- Observation space: 15\n- Action space: 3\n\n#### HalfCheetahBulletEnv-v0\n\n- Observation space: 26\n- Action space: 6\n\n#### AntBulletEnv-v0\n\n- Observation space: 28\n- Action space: 8\n\n#### HumanoidDeepMimicWalkBulletEnv-v1\n\n- Observation space: 197\n- Action space: 36\n\n## Requirements\n\n- [PyTorch](https://pytorch.org)\n- [TensorBoard](https://pytorch.org/docs/stable/tensorboard.html)\n- [gym](https://github.com/openai/gym)\n- [mujoco-py](https://github.com/openai/mujoco-py)\n- [PyBullet](https://pybullet.org/wordpress/)\n\n## Usage\n\nThe repository's high-level structure is:\n\n    ├── agents                    \n        └── common \n    ├── results  \n        ├── data \n        └── graphs        \n    └── save_model\n\n### 1) To train the agents on the environments\n\nTo train all the different agents on PyBullet environments, follow these steps:\n\n```commandline\ngit clone https://github.com/dongminlee94/deep_rl.git\ncd deep_rl\npython run_bullet.py\n```\n\nFor other environments, change the last line to `run_cartpole.py`, `run_pendulum.py`, `run_mujoco.py`.\n\nIf you want to change configurations of the agents, follow this step:\n```commandline\npython run_bullet.py \\\n    --env=HumanoidDeepMimicWalkBulletEnv-v1 \\\n    --algo=sac-aea \\\n    --phase=train \\\n    --render=False \\\n    --load=None \\\n    --seed=0 \\\n    --iterations=200 \\\n    --steps_per_iter=5000 \\\n    --max_step=1000 \\\n    --tensorboard=True \\\n    --gpu_index=0\n```\n\n### 2) To watch the learned agents on the above environments\n\nTo watch all the learned agents on PyBullet environments, follow these steps:\n\n```commandline\npython run_bullet.py \\\n    --env=HumanoidDeepMimicWalkBulletEnv-v1 \\\n    --algo=sac-aea \\\n    --phase=test \\\n    --render=True \\\n    --load=envname_algoname_... \\\n    --seed=0 \\\n    --iterations=200 \\\n    --steps_per_iter=5000 \\\n    --max_step=1000 \\\n    --tensorboard=False \\\n    --gpu_index=0\n```\n\nYou should copy the saved model name in `save_model/envname_algoname_...` and paste the copied name in `envname_algoname_...`. So the saved model will be load.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdongminlee94%2Fdeep_rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdongminlee94%2Fdeep_rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdongminlee94%2Fdeep_rl/lists"}