{"id":13689254,"url":"https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch","last_synced_at":"2025-05-01T23:33:16.020Z","repository":{"id":38271776,"uuid":"147824034","full_name":"p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch","owner":"p-christ","description":"PyTorch implementations of deep reinforcement learning algorithms and environments","archived":false,"fork":false,"pushed_at":"2024-07-25T10:14:35.000Z","size":31575,"stargazers_count":5623,"open_issues_count":47,"forks_count":1197,"subscribers_count":106,"default_branch":"master","last_synced_at":"2024-10-29T14:59:41.138Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/p-christ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-09-07T13:01:21.000Z","updated_at":"2024-10-29T10:01:02.000Z","dependencies_parsed_at":"2022-06-27T21:31:25.056Z","dependency_job_id":"a555780c-abc2-4c62-a5b5-b6b78f6e714f","html_url":"https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/p-christ","download_url":"https://codeload.github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224282194,"owners_count":17285786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T15:01:40.233Z","updated_at":"2024-11-12T13:31:11.202Z","avatar_url":"https://github.com/p-christ.png","language":"Python","readme":"# Deep Reinforcement Learning Algorithms with PyTorch\n\n![Travis CI](https://travis-ci.org/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch.svg?branch=master)\n[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/dwyl/esta/issues)\n\n\n\n![RL](utilities/RL_image.jpeg)   ![PyTorch](utilities/PyTorch-logo-2.jpg)\n\nThis repository contains PyTorch implementations of deep reinforcement learning algorithms and environments. \n\n(To help you remember things you learn about machine learning in general write them in [Gizmo](https://gizmo.ai))\n## **Algorithms Implemented**  \n\n1. *Deep Q Learning (DQN)* \u003csub\u003e\u003csup\u003e ([Mnih et al. 2013](https://arxiv.org/pdf/1312.5602.pdf)) \u003c/sup\u003e\u003c/sub\u003e  \n1. *DQN with Fixed Q Targets* \u003csub\u003e\u003csup\u003e ([Mnih et al. 2013](https://arxiv.org/pdf/1312.5602.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Double DQN (DDQN)* \u003csub\u003e\u003csup\u003e ([Hado van Hasselt et al. 2015](https://arxiv.org/pdf/1509.06461.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DDQN with Prioritised Experience Replay* \u003csub\u003e\u003csup\u003e ([Schaul et al. 2016](https://arxiv.org/pdf/1511.05952.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Dueling DDQN* \u003csub\u003e\u003csup\u003e ([Wang et al. 2016](http://proceedings.mlr.press/v48/wangf16.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *REINFORCE* \u003csub\u003e\u003csup\u003e ([Williams et al. 1992](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Deep Deterministic Policy Gradients (DDPG)* \u003csub\u003e\u003csup\u003e ([Lillicrap et al. 2016](https://arxiv.org/pdf/1509.02971.pdf) ) \u003c/sup\u003e\u003c/sub\u003e\n1. *Twin Delayed Deep Deterministic Policy Gradients (TD3)* \u003csub\u003e\u003csup\u003e ([Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Soft Actor-Critic (SAC)* \u003csub\u003e\u003csup\u003e ([Haarnoja et al. 2018](https://arxiv.org/pdf/1812.05905.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Soft Actor-Critic for Discrete Actions (SAC-Discrete)* \u003csub\u003e\u003csup\u003e ([Christodoulou 2019](https://arxiv.org/abs/1910.07207)) \u003c/sup\u003e\u003c/sub\u003e \n1. *Asynchronous Advantage Actor Critic (A3C)* \u003csub\u003e\u003csup\u003e ([Mnih et al. 2016](https://arxiv.org/pdf/1602.01783.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Syncrhonous Advantage Actor Critic (A2C)*\n1. *Proximal Policy Optimisation (PPO)* \u003csub\u003e\u003csup\u003e ([Schulman et al. 2017](https://openai-public.s3-us-west-2.amazonaws.com/blog/2017-07/ppo/ppo-arxiv.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DQN with Hindsight Experience Replay (DQN-HER)* \u003csub\u003e\u003csup\u003e ([Andrychowicz et al. 2018](https://arxiv.org/pdf/1707.01495.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DDPG with Hindsight Experience Replay (DDPG-HER)* \u003csub\u003e\u003csup\u003e ([Andrychowicz et al. 2018](https://arxiv.org/pdf/1707.01495.pdf) ) \u003c/sup\u003e\u003c/sub\u003e\n1. *Hierarchical-DQN (h-DQN)* \u003csub\u003e\u003csup\u003e ([Kulkarni et al. 2016](https://arxiv.org/pdf/1604.06057.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL)* \u003csub\u003e\u003csup\u003e ([Florensa et al. 2017](https://arxiv.org/pdf/1704.03012.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Diversity Is All You Need (DIAYN)* \u003csub\u003e\u003csup\u003e ([Eyensbach et al. 2018](https://arxiv.org/pdf/1802.06070.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n\nAll implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), \nBit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add more hierarchical RL algorithms soon.\n\n## **Environments Implemented**\n\n1. *Bit Flipping Game* \u003csub\u003e\u003csup\u003e (as described in [Andrychowicz et al. 2018](https://arxiv.org/pdf/1707.01495.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Four Rooms Game* \u003csub\u003e\u003csup\u003e (as described in [Sutton et al. 1998](http://www-anw.cs.umass.edu/~barto/courses/cs687/Sutton-Precup-Singh-AIJ99.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Long Corridor Game* \u003csub\u003e\u003csup\u003e (as described in [Kulkarni et al. 2016](https://arxiv.org/pdf/1604.06057.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Ant-{Maze, Push, Fall}* \u003csub\u003e\u003csup\u003e (as desribed in [Nachum et al. 2018](https://arxiv.org/pdf/1805.08296.pdf) and their accompanying [code](https://github.com/tensorflow/models/tree/master/research/efficient-hrl)) \u003c/sup\u003e\u003c/sub\u003e\n\n## **Results**\n\n#### 1. Cart Pole and Mountain Car\n\nBelow shows various RL algorithms successfully learning discrete action game [Cart Pole](https://github.com/openai/gym/wiki/CartPole-v0)\n or continuous action game [Mountain Car](https://github.com/openai/gym/wiki/MountainCarContinuous-v0). The mean result from running the algorithms \n with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters\n used can be found in files `results/Cart_Pole.py` and `results/Mountain_Car.py`. \n \n![Cart Pole and Mountain Car Results](results/data_and_graphs/CartPole_and_MountainCar_Graph.png) \n\n\n#### 2. Hindsight Experience Replay (HER) Experiements\n\nBelow shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) \nand Fetch Reach environments described in the papers [Hindsight Experience Replay 2018](https://arxiv.org/pdf/1707.01495.pdf) \nand [Multi-Goal Reinforcement Learning 2018](https://arxiv.org/abs/1802.09464). The results replicate the results found in \nthe papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference \nbetween them was whether hindsight was used or not. \n\n![HER Experiment Results](results/data_and_graphs/HER_Experiments.png)\n\n#### 3. Hierarchical Reinforcement Learning Experiments\n\nThe results on the left below show the performance of DQN and the algorithm hierarchical-DQN from [Kulkarni et al. 2016](https://arxiv.org/pdf/1604.06057.pdf)\non the Long Corridor environment also explained in [Kulkarni et al. 2016](https://arxiv.org/pdf/1604.06057.pdf). The environment\nrequires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed \ngratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a \nmeta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This \naligns with the results found in the paper. \n\nThe results on the right show the performance of DDQN and algorithm Stochastic NNs for Hierarchical Reinforcement Learning \n(SNN-HRL) from [Florensa et al. 2017](https://arxiv.org/pdf/1704.03012.pdf). DDQN is used as the comparison because\nthe implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training\nfor SNN-HRL were used for pre-training which is why there is no reward for those episodes. \n \n![Long Corridor and Four Rooms](results/data_and_graphs/Four_Rooms_and_Long_Corridor.png)\n     \n\n### Usage ###\n\nThe repository's high-level structure is:\n \n    ├── agents                    \n        ├── actor_critic_agents   \n        ├── DQN_agents         \n        ├── policy_gradient_agents\n        └── stochastic_policy_search_agents \n    ├── environments   \n    ├── results             \n        └── data_and_graphs        \n    ├── tests\n    ├── utilities             \n        └── data structures            \n   \n\n#### i) To watch the agents learn the above games  \n\nTo watch all the different agents learn Cart Pole follow these steps:\n\n```commandline\ngit clone https://github.com/p-christ/Deep_RL_Implementations.git\ncd Deep_RL_Implementations\n\nconda create --name myenvname\ny\nconda activate myenvname\n\npip3 install -r requirements.txt\n\npython results/Cart_Pole.py\n``` \n\nFor other games change the last line to one of the other files in the Results folder. \n\n#### ii) To train the agents on another game  \n\nMost Open AI gym environments should work. All you would need to do is change the config.environment field (look at `Results/Cart_Pole.py`  for an example of this). \n\nYou can also play with your own custom game if you create a separate class that inherits from gym.Env. See `Environments/Four_Rooms_Environment.py`\nfor an example of a custom environment and then see the script `Results/Four_Rooms.py` to see how to have agents play the environment.\n","funding_links":[],"categories":["Python","时间序列","Paper implementations｜论文实现","Paper implementations"],"sub_categories":["网络服务_其他","Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fp-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fp-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fp-christ%2FDeep-Reinforcement-Learning-Algorithms-with-PyTorch/lists"}