{"id":13006169,"url":"https://github.com/XinJingHao/DRL-Pytorch","last_synced_at":"2025-03-04T15:31:14.582Z","repository":{"id":40342440,"uuid":"427926332","full_name":"XinJingHao/DRL-Pytorch","owner":"XinJingHao","description":"Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)","archived":false,"fork":false,"pushed_at":"2025-02-28T12:52:13.000Z","size":58053,"stargazers_count":1959,"open_issues_count":2,"forks_count":239,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-02-28T19:05:34.558Z","etag":null,"topics":["asl","c51","categorical-dqn","ddpg","deep-reinforcement-learning","double-dqn","dueling-dqn","machine-learning","noisynet-dqn","ppo","prioritized-experience-replay","pytorch","q-learning","reinforcement-learning","sac","td3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/XinJingHao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-14T12:46:46.000Z","updated_at":"2025-02-28T15:39:57.000Z","dependencies_parsed_at":"2023-08-21T05:23:08.529Z","dependency_job_id":"5df94238-c882-467c-b1e1-6fd24b808b19","html_url":"https://github.com/XinJingHao/DRL-Pytorch","commit_stats":null,"previous_names":["xinjinghao/deep-reinforcement-learning-algorithms-with-pytorch","xinjinghao/drl-pytorch"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XinJingHao%2FDRL-Pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XinJingHao%2FDRL-Pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XinJingHao%2FDRL-Pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XinJingHao%2FDRL-Pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/XinJingHao","download_url":"https://codeload.github.com/XinJingHao/DRL-Pytorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241872161,"owners_count":20034617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asl","c51","categorical-dqn","ddpg","deep-reinforcement-learning","double-dqn","dueling-dqn","machine-learning","noisynet-dqn","ppo","prioritized-experience-replay","pytorch","q-learning","reinforcement-learning","sac","td3"],"created_at":"2024-07-24T00:36:17.504Z","updated_at":"2025-03-04T15:31:14.577Z","avatar_url":"https://github.com/XinJingHao.png","language":"Python","funding_links":[],"categories":["时间序列"],"sub_categories":["网络服务_其他"],"readme":"\u003cdiv align=center\u003e\n\u003cimg src=\"https://github.com/XinJingHao/RL-Algorithms-by-Pytorch/blob/main/RL_PYTORCH.png\" width=500 /\u003e\n\u003c/div\u003e\n\n\u003cdiv align=center\u003e\nClean, Robust, and Unified PyTorch implementation of popular DRL Algorithms\n\u003c/div\u003e\n\n\u003cdiv align=center\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Python-blue\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Pytorch-ff69b4\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/DRL-blueviolet\" /\u003e\n\u003c/div\u003e\n\n\u003cbr/\u003e\n\u003cbr/\u003e\n\n## 0.Star History\n\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"70%\" height=\"auto\" src=\"https://api.star-history.com/svg?repos=XinJingHao/Deep-Reinforcement-Learning-Algorithms-with-Pytorch\u0026type=Date\"\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n\n## 1.Dependencies\nThis repository uses the following python dependencies unless explicitly stated:\n```python\ngymnasium==0.29.1\nnumpy==1.26.1\npytorch==2.1.0\n\npython==3.11.5\n```\n\n\u003cbr/\u003e\n\n## 2.How to use my code\nEnter the folder of the algorithm that you want to use, and run the **main.py** to train from scratch:\n```bash\npython main.py\n```\nFor more details, please check the **README.md** file in the corresponding algorithm folder.\n\n\u003cbr/\u003e\n\n## 3. Separate links of the code\n+ [1.Q-learning](https://github.com/XinJingHao/Q-learning)\n+ [2.1Duel Double DQN](https://github.com/XinJingHao/Duel-Double-DQN-Pytorch)\n+ [2.2Noisy Duel DDQN on Atari Game](https://github.com/XinJingHao/Noisy-Duel-DDQN-Atari-Pytorch)\n+ [2.3Prioritized Experience Replay(PER) DQN/DDQN](https://github.com/XinJingHao/Prioritized-DQN-DDQN-Pytorch)\n+ [2.4Categorical DQN (C51)](https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch)\n+ [2.5NoisyNet DQN](https://github.com/XinJingHao/NoisyNet-DQN-Pytorch)\n+ [3.1Proximal Policy Optimization(PPO) for Discrete Action Space](https://github.com/XinJingHao/PPO-Discrete-Pytorch)\n+ [3.2Proximal Policy Optimization(PPO) for Continuous Action Space](https://github.com/XinJingHao/PPO-Continuous-Pytorch)\n+ [4.1Deep Deternimistic Policy Gradient(DDPG)](https://github.com/XinJingHao/DDPG-Pytorch)\n+ [4.2Twin Delayed Deep Deterministic Policy Gradient(TD3)](https://github.com/XinJingHao/TD3-Pytorch)\n+ [5.1Soft Actor Critic(SAC) for Discrete Action Space](https://github.com/XinJingHao/SAC-Discrete-Pytorch)\n+ [5.2Soft Actor Critic(SAC) for Continuous Action Space](https://github.com/XinJingHao/SAC-Continuous-Pytorch)\n+ [6.Actor-Sharer-Learner(ASL)](https://github.com/XinJingHao/Actor-Sharer-Learner)\n\n\u003cbr/\u003e\n\n## 4. Recommended Resources for DRL\n### 4.1 Simulation Environments:\n+ [gym](https://www.gymlibrary.dev/) and [gymnasium](https://gymnasium.farama.org/) (Lightweight \u0026 Standard Env for DRL; Easy to start; Slow):\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"60%\" height=\"auto\" src=\"https://github.com/XinJingHao/Images/blob/main/Env_images/gym.gif\"\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n+ [Isaac Sim](https://developer.nvidia.com/isaac/sim#isaac-lab) (NVIDIA’s physics simulation environment; GPU accelerated; Superfast):\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"60%\" height=\"auto\" src=\"https://github.com/XinJingHao/Images/blob/main/Env_images/IsaacGym.gif\"\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n+ [Sparrow](https://github.com/XinJingHao/Sparrow-V2) (Light Weight Simulator for Mobile Robot; DRL friendly):\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"62%\" height=\"auto\" src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V1/render.gif\"\u003e\n\u003c/div\u003e\n\n\u003cp align=\"left\"\u003e\n  \u003cimg src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V2/case_v2.gif\" width=\"10%\" height=\"auto\"  /\u003e\n  \u003cimg src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V2/case2.gif\" width=\"10%\" height=\"auto\" /\u003e\n  \u003cimg src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V2/play.gif\" width=\"10%\" height=\"auto\" /\u003e\n  \u003cimg src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V3/N1.gif\" width=\"10%\" height=\"auto\" /\u003e\n  \u003cimg src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V3/N3.gif\" width=\"10%\" height=\"auto\" /\u003e\n  \u003cimg src=\"https://github.com/XinJingHao/Images/blob/main/Sparrow_V3/N10.gif\" width=\"10%\" height=\"auto\" /\u003e\n\u003c/p\u003e\n\n\u003cbr/\u003e\n\n+ [ROS](https://www.ros.org/) (Popular \u0026 Comprehensive physical simulator for robots; Heavy and Slow):\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"60%\" height=\"auto\" src=\"https://github.com/XinJingHao/Images/blob/main/Env_images/ros.mp4.gif\"\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n+ [Webots](https://cyberbotics.com/) (Popular physical simulator for robots; Faster than ROS; Less realistic):\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"60%\" height=\"auto\" src=\"https://github.com/XinJingHao/Images/blob/main/Env_images/webots.gif\"\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n+ [Envpool](https://envpool.readthedocs.io/en/latest/index.html) (Fast Vectorized Env)\n+ [Other Popular Envs](https://github.com/clvrai/awesome-rl-envs)\n\n### 4.2 Books：\n+ [《Reinforcement learning: An introduction》](https://books.google.com.sg/books?hl=zh-CN\u0026lr=\u0026id=uWV0DwAAQBAJ\u0026oi=fnd\u0026pg=PR7\u0026dq=Reinforcement+Learning\u0026ots=mivIu01Xp6\u0026sig=zQ6jkZRxJop4fkAgScMgzULGlbY\u0026redir_esc=y#v=onepage\u0026q\u0026f=false)--Richard S. Sutton\n+ 《深度学习入门：基于Python的理论与实现》--斋藤康毅\n\n### 4.3 Online Courses:\n+ [RL Courses(bilibili)](https://www.bilibili.com/video/BV1UE411G78S?p=1\u0026vd_source=df4b7370976f5ca5034cc18488eec368)--李宏毅(Hongyi Li)\n+ [RL Courses(Youtube)](https://www.youtube.com/watch?v=z95ZYgPgXOY\u0026list=PLJV_el3uVTsODxQFgzMzPLa16h6B8kWM_)--李宏毅(Hongyi Li)\n+ [UCL Course on RL](https://www.davidsilver.uk/teaching/)--David Silver\n+ [动手强化学习](https://hrl.boyuai.com/chapter/1/%E5%88%9D%E6%8E%A2%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0)--上海交通大学\n+ [DRL Courses](https://github.com/wangshusen/DRL)--Shusen Wang\n\n### 4.4 Blogs:\n+ [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/)\n+ [Policy Gradient Theorem --Cangxi](https://zhuanlan.zhihu.com/p/491647161)\n+ [Policy Gradient Algorithms --Lilian](https://lilianweng.github.io/posts/2018-04-08-policy-gradient/)\n+ [Theorem of PPO](https://zhuanlan.zhihu.com/p/563166533)\n+ [The 37 Implementation Details of Proximal Policy Optimization](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/)\n+ [Prioritized Experience Replay](https://zhuanlan.zhihu.com/p/631171588)\n+ [Soft Actor Critic](https://zhuanlan.zhihu.com/p/566722896)\n+ [A (Long) Peek into Reinforcement Learning --Lilian](https://lilianweng.github.io/posts/2018-02-19-rl-overview/)\n+ [Introduction to TD3](https://zhuanlan.zhihu.com/p/409536699)\n\n\u003cbr/\u003e\n\n## 5. Important Papers\nDQN: [Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.](https://www.nature.com/articles/nature14236/?source=post_page)\n\nDouble DQN: [Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).](https://ojs.aaai.org/index.php/AAAI/article/view/10295)\n\nDuel DQN: [Wang, Ziyu, et al. \"Dueling network architectures for deep reinforcement learning.\" International conference on machine learning. PMLR, 2016.](https://proceedings.mlr.press/v48/wangf16.pdf)\n\nPER: [Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.](https://arxiv.org/abs/1511.05952)\n\nC51: [Bellemare M G, Dabney W, Munos R. A distributional perspective on reinforcement learning[C]//International conference on machine learning. PMLR, 2017: 449-458.](https://proceedings.mlr.press/v70/bellemare17a/bellemare17a.pdf)\n\nNoisyNet DQN: [Fortunato M, Azar M G, Piot B, et al. Noisy networks for exploration[J]. arXiv preprint arXiv:1706.10295, 2017.](https://arxiv.org/abs/1706.10295)\n\nPPO: [Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.](https://arxiv.org/pdf/1707.06347.pdf)\n\nDDPG: [Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.](https://arxiv.org/abs/1509.02971)\n\nTD3: [Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018: 1587-1596.](https://proceedings.mlr.press/v80/fujimoto18a.html)\n\nSAC: [Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.](https://proceedings.mlr.press/v80/haarnoja18b)\n\nASL: [Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity](https://arxiv.org/abs/2305.04180)\n\nColorDynamic: [Generalizable, Scalable, Real-time, End-to-end Local Planner for Unstructured and Dynamic Environments](https://arxiv.org/abs/2502.19892)\n\n\u003cbr/\u003e\n\n\n## 6. Training Curves of my Code:\n\n### [Q-learning:](https://github.com/XinJingHao/Q-learning)\n\u003cimg src=\"https://github.com/XinJingHao/Q-learning/blob/main/result.svg\" width=320\u003e\n\n### [Duel Double DQN:](https://github.com/XinJingHao/Duel-Double-DQN-Pytorch)\n|                           CartPole                           |                         LunarLander                          |\n| :----------------------------------------------------------: | :----------------------------------------------------------: |\n\u003cimg src=\"https://github.com/XinJingHao/DQN-DDQN-Pytorch/blob/main/IMGs/cp_all.png\" width=\"320\" height=\"200\"\u003e | \u003cimg src=\"https://github.com/XinJingHao/DQN-DDQN-Pytorch/blob/main/IMGs/lld_all.png\" width=\"320\" height=\"200\"\u003e\n\n\n\n### [Noisy Duel DDQN on Atari Game:](https://github.com/XinJingHao/Noisy-Duel-DDQN-Atari-Pytorch)\nPong| Enduro\n:-----------------------:|:-----------------------:|\n\u003cimg src=\"https://github.com/XinJingHao/Noisy-Duel-DDQN-Atari-Pytorch/blob/main/IMGs/Pong.png\" width=\"320\" height=\"200\"\u003e| \u003cimg src=\"https://github.com/XinJingHao/Noisy-Duel-DDQN-Atari-Pytorch/blob/main/IMGs/Enduro.png\" width=\"320\" height=\"200\"\u003e\n\n\u003cbr/\u003e\n\n\n### [Prioritized DQN/DDQN:](https://github.com/XinJingHao/Prioritized-DQN-DDQN-Pytorch)\n|                           CartPole                           |                         LunarLander                          |\n| :----------------------------------------------------------: | :----------------------------------------------------------: |\n| \u003cimg src=\"https://github.com/XinJingHao/Prioritized-DQN-DDQN-Pytorch/blob/main/LightPriorDQN_gym0.2x/IMGs/CPV1.svg\" width=\"320\" height=\"200\"\u003e | \u003cimg src=\"https://github.com/XinJingHao/Prioritized-DQN-DDQN-Pytorch/blob/main/LightPriorDQN_gym0.2x/IMGs/LLDV2.svg\" width=\"320\" height=\"200\"\u003e |\n\n\u003cbr/\u003e\n\n### [Categorical DQN:](https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch)\n|                           CartPole                           |                         LunarLander                          |\n| :----------------------------------------------------------: | :----------------------------------------------------------: |\n| \u003cimg src=\"https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch/blob/main/Images/cp.svg\" width=\"320\" height=\"200\"\u003e | \u003cimg src=\"https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch/blob/main/Images/lld.svg\" width=\"320\" height=\"200\"\u003e |\n\n\u003cbr/\u003e\n\n### [NoisyNet DQN:](https://github.com/XinJingHao/C51-Categorical-DQN-Pytorch)\n|                           CartPole                           |                         LunarLander                          |\n| :----------------------------------------------------------: | :----------------------------------------------------------: |\n| \u003cimg src=\"https://github.com/XinJingHao/NoisyNet-DQN-Pytorch/blob/main/IMGs/cpv1.png\" width=\"320\" height=\"200\"\u003e | \u003cimg src=\"https://github.com/XinJingHao/NoisyNet-DQN-Pytorch/blob/main/IMGs/lldv2.png\" width=\"320\" height=\"200\"\u003e |\n\n\u003cbr/\u003e\n\n### [PPO Discrete:](https://github.com/XinJingHao/PPO-Discrete-Pytorch)\n\u003cimg src=\"https://github.com/XinJingHao/PPO-Discrete-Pytorch/blob/main/result.jpg\" width=700\u003e\n\n### [PPO Continuous:](https://github.com/XinJingHao/PPO-Continuous-Pytorch)\n\u003cimg src=\"https://github.com/XinJingHao/PPO-Continuous-Pytorch/blob/main/ppo_result.jpg\"\u003e\n\n### [DDPG:](https://github.com/XinJingHao/DDPG-Pytorch)\nPendulum| LunarLanderContinuous\n:-----------------------:|:-----------------------:|\n\u003cimg src=\"https://github.com/XinJingHao/DDPG-Pytorch/blob/main/IMGs/ddpg_pv0.svg\" width=\"320\" height=\"200\"\u003e| \u003cimg src=\"https://github.com/XinJingHao/DDPG-Pytorch/blob/main/IMGs/ddpg_lld.svg\" width=\"320\" height=\"200\"\u003e \n\n\u003cbr/\u003e\n\n### [TD3:](https://github.com/XinJingHao/TD3-Pytorch)\n\u003cimg src=\"https://github.com/XinJingHao/TD3-Pytorch/blob/main/images/TD3results.png\" width=700\u003e\n\n### [SAC Continuous:](https://github.com/XinJingHao/SAC-Continuous-Pytorch)\n\u003cimg src=\"https://github.com/XinJingHao/SAC-Continuous-Pytorch/blob/main/imgs/result.jpg\" width=700\u003e\n\n### [SAC Discrete:](https://github.com/XinJingHao/SAC-Discrete-Pytorch)\n\u003cimg src=\"https://github.com/XinJingHao/SAC-Discrete-Pytorch/blob/main/imgs/sacd_result.jpg\" width=700\u003e\n\n### [Actor-Sharer-Learner (ASL):](https://github.com/XinJingHao/Actor-Sharer-Learner)\n\u003cdiv align=\"left\"\u003e\n\u003cimg width=\"70%\" height=\"auto\" src=\"https://github.com/XinJingHao/Images/blob/main/asl/ss_e.svg\"\u003e\n\u003c/div\u003e\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FXinJingHao%2FDRL-Pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FXinJingHao%2FDRL-Pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FXinJingHao%2FDRL-Pytorch/lists"}