{"id":15034787,"url":"https://github.com/sweetice/deep-reinforcement-learning-with-pytorch","last_synced_at":"2025-05-14T20:07:41.815Z","repository":{"id":38420821,"uuid":"136681445","full_name":"sweetice/Deep-reinforcement-learning-with-pytorch","owner":"sweetice","description":"PyTorch implementation of DQN, AC,  ACER, A2C, A3C, PG,  DDPG, TRPO, PPO, SAC, TD3 and ....","archived":false,"fork":false,"pushed_at":"2023-03-24T23:36:09.000Z","size":44128,"stargazers_count":4267,"open_issues_count":29,"forks_count":876,"subscribers_count":35,"default_branch":"master","last_synced_at":"2025-05-01T23:35:53.823Z","etag":null,"topics":["a2c","a3c","actor-critic","actor-critic-algorithm","algorithm","alphago","deep-learning","deep-reinforcement-learning","dqn","policy-gradient","ppo","pytorch","reinforce","resnet","sac","sarsa","td3","trpo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sweetice.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-06-09T01:43:02.000Z","updated_at":"2025-05-01T03:12:57.000Z","dependencies_parsed_at":"2022-07-14T04:40:29.197Z","dependency_job_id":"65daa8b6-8e04-4e51-b56e-a015c708b34a","html_url":"https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sweetice%2FDeep-reinforcement-learning-with-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sweetice%2FDeep-reinforcement-learning-with-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sweetice%2FDeep-reinforcement-learning-with-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sweetice%2FDeep-reinforcement-learning-with-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sweetice","download_url":"https://codeload.github.com/sweetice/Deep-reinforcement-learning-with-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254219373,"owners_count":22034397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","a3c","actor-critic","actor-critic-algorithm","algorithm","alphago","deep-learning","deep-reinforcement-learning","dqn","policy-gradient","ppo","pytorch","reinforce","resnet","sac","sarsa","td3","trpo"],"created_at":"2024-09-24T20:26:20.183Z","updated_at":"2025-05-14T20:07:41.759Z","avatar_url":"https://github.com/sweetice.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"**Status:** Active (under active development, breaking changes may occur)\n\nThis repository will implement the classic and state-of-the-art deep reinforcement learning algorithms. The aim of this repository is to provide clear pytorch code for people to learn the deep reinforcement learning algorithm. \n\nIn the future, more state-of-the-art algorithms will be added and the existing codes will also be maintained.\n\n![demo](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/figures/grid.gif)\n\n## Requirements\n- python \u003c=3.6 \n- tensorboardX\n- gym \u003e= 0.10\n- pytorch \u003e= 0.4\n\n**Note that tensorflow does not support python3.7** \n\n## Installation\n\n```\npip install -r requirements.txt\n```\n\nIf you fail:  \n\n- Install gym\n\n```\npip install gym\n```\n\n\n\n- Install the pytorch\n```bash\nplease go to official webisite to install it: https://pytorch.org/\n\nRecommend use Anaconda Virtual Environment to manage your packages\n\n```\n\n- Install tensorboardX\n```bash\npip install tensorboardX\npip install tensorflow==1.12\n```\n\n- Test \n```\ncd Char10\\ TD3/\npython TD3_BipedalWalker-v2.py --mode test\n```\n\nYou could see a bipedalwalker if you install successfully.\n\nBipedalWalker: \n\n![](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/figures/test.png)\n\n- 4. install openai-baselines (**Optional**)\n\n```bash\n# clone the openai baselines\ngit clone https://github.com/openai/baselines.git\ncd baselines\npip install -e .\n\n```\n\n## DQN\n\nHere I uploaded two DQN models which is trianing CartPole-v0 and MountainCar-v0.\n\n### Tips for MountainCar-v0\n\nThis is a sparse binary reward task. Only when car reach the top of the mountain there is a none-zero reward. In genearal it may take 1e5 steps in stochastic policy. You can add a reward term, for example, to change to the current position of the Car is positively related. Of course, there is a more advanced approach that is inverse reinforcement learning.\n\n![value_loss](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/Char01%20DQN/DQN/pic/value_loss.jpg)   \n![step](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/Char01%20DQN/DQN/pic/finish_episode.jpg) \nThis is value loss for DQN, We can see that the loss increaded to 1e13, however, the network work well. Because the target_net and act_net are very different with the training process going on. The calculated loss cumulate large. The previous loss was small because the reward was very sparse, resulting in a small update of the two networks.\n\n### Papers Related to the DQN\n\n\n  1. Playing Atari with Deep Reinforcement Learning [[arxiv]](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/1.dqn.ipynb)\n  2. Deep Reinforcement Learning with Double Q-learning [[arxiv]](https://arxiv.org/abs/1509.06461) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/2.double%20dqn.ipynb)\n  3. Dueling Network Architectures for Deep Reinforcement Learning [[arxiv]](https://arxiv.org/abs/1511.06581) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/3.dueling%20dqn.ipynb)\n  4. Prioritized Experience Replay [[arxiv]](https://arxiv.org/abs/1511.05952) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/4.prioritized%20dqn.ipynb)\n  5. Noisy Networks for Exploration [[arxiv]](https://arxiv.org/abs/1706.10295) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/5.noisy%20dqn.ipynb)\n  6. A Distributional Perspective on Reinforcement Learning [[arxiv]](https://arxiv.org/pdf/1707.06887.pdf) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/6.categorical%20dqn.ipynb)\n  7. Rainbow: Combining Improvements in Deep Reinforcement Learning [[arxiv]](https://arxiv.org/abs/1710.02298) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/7.rainbow%20dqn.ipynb)\n  8. Distributional Reinforcement Learning with Quantile Regression [[arxiv]](https://arxiv.org/pdf/1710.10044.pdf) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/8.quantile%20regression%20dqn.ipynb)\n  9. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation  [[arxiv]](https://arxiv.org/abs/1604.06057) [[code]](https://github.com/higgsfield/RL-Adventure/blob/master/9.hierarchical%20dqn.ipynb)\n  10. Neural Episodic Control [[arxiv]](https://arxiv.org/pdf/1703.01988.pdf) [[code]](#)\n\n\n## Policy Gradient\n\n\nUse the following command to run a saved model\n\n\n```\npython Run_Model.py\n```\n\n\nUse the following command to train model\n\n\n```\npython pytorch_MountainCar-v0.py\n```\n\n\n\n\u003e policyNet.pkl\n\nThis is a model that I have trained.\n\n\n## Actor-Critic\n\nThis is an algorithmic framework, and the classic REINFORCE method is stored under Actor-Critic.\n \n## DDPG  \nEpisode reward in Pendulum-v0:  \n\n![ep_r](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/Char05%20DDPG/DDPG_exp.jpg)  \n\n\n## PPO  \n\n- Original paper: https://arxiv.org/abs/1707.06347\n- Openai Baselines blog post: https://blog.openai.com/openai-baselines-ppo/\n\n\n## A2C\n\nAdvantage Policy Gradient, an paper in 2017 pointed out that the difference in performance between A2C and A3C is not obvious.\n\nThe Asynchronous Advantage Actor Critic method (A3C) has been very influential since the paper was published. The algorithm combines a few key ideas:\n\n- An updating scheme that operates on fixed-length segments of experience (say, 20 timesteps) and uses these segments to compute estimators of the returns and advantage function.\n- Architectures that share layers between the policy and value function.\n- Asynchronous updates.\n\n## A3C\n\nOriginal paper: https://arxiv.org/abs/1602.01783\n\n## SAC\n\n**This is not the implementation of the author of paper!!!**\n\nEpisode reward in Pendulum-v0:\n\n![ep_r](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/Char09%20SAC/SAC_ep_r_curve.png)\n\n## TD3\n\n**This is not the implementation of the author of paper!!!**  \n\nEpisode reward in Pendulum-v0:  \n\n![ep_r](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/Char10%20TD3/TD3_Pendulum-v0.png)  \n\nEpisode reward in BipedalWalker-v2:  \n![ep_r](https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch/blob/master/Char10%20TD3/Episode_reward_TD3_BipedakWalker.png)  \n\nIf you want to use the test your model:\n\n```\npython TD3_BipedalWalker-v2.py --mode test\n```\n\n## Papers Related to the Deep Reinforcement Learning\n[01] [A Brief Survey of Deep Reinforcement Learning](https://arxiv.org/abs/1708.05866)  \n[02] [The Beta Policy for Continuous Control Reinforcement Learning](https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf)  \n[03] [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)  \n[04] [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461)  \n[05] [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)  \n[06] [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)  \n[07] [Continuous Deep Q-Learning with Model-based Acceleration](https://arxiv.org/abs/1603.00748)  \n[08] [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1602.01783)  \n[09] [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477)  \n[10] [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)  \n[11] [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation](https://arxiv.org/abs/1708.05144)  \n[12] [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438)  \n[13] [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290)  \n[14] [Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477)  \n\n## TO DO\n- [x] DDPG\n- [x] SAC\n- [x] TD3\n\n\n# Best RL courses\n- [OpenAI's spinning up](https://spinningup.openai.com/)  \n- [David Silver's course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)  \n- [Berkeley deep RL](http://rll.berkeley.edu/deeprlcourse/)  \n- [Practical RL](https://github.com/yandexdataschool/Practical_RL)  \n- [Deep Reinforcement Learning by Hung-yi Lee](https://www.youtube.com/playlist?list=PLJV_el3uVTsODxQFgzMzPLa16h6B8kWM_)   \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsweetice%2Fdeep-reinforcement-learning-with-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsweetice%2Fdeep-reinforcement-learning-with-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsweetice%2Fdeep-reinforcement-learning-with-pytorch/lists"}