{"id":15036325,"url":"https://github.com/shangtongzhang/deeprl","last_synced_at":"2025-04-13T18:36:22.007Z","repository":{"id":41384174,"uuid":"88905488","full_name":"ShangtongZhang/DeepRL","owner":"ShangtongZhang","description":"Modularized Implementation of Deep RL Algorithms in PyTorch","archived":false,"fork":false,"pushed_at":"2024-04-16T19:43:16.000Z","size":10866,"stargazers_count":3285,"open_issues_count":7,"forks_count":692,"subscribers_count":89,"default_branch":"master","last_synced_at":"2025-04-06T15:07:46.971Z","etag":null,"topics":["a2c","categorical-dqn","ddpg","deep-reinforcement-learning","deeprl","double-dqn","dqn","dueling-network-architecture","option-critic","option-critic-architecture","ppo","prioritized-experience-replay","pytorch","quantile-regression","rainbow","td3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShangtongZhang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-20T19:59:53.000Z","updated_at":"2025-04-03T08:34:43.000Z","dependencies_parsed_at":"2022-08-10T02:07:05.483Z","dependency_job_id":null,"html_url":"https://github.com/ShangtongZhang/DeepRL","commit_stats":{"total_commits":465,"total_committers":4,"mean_commits":116.25,"dds":0.3612903225806452,"last_synced_commit":"13dd18042414ad112bd0bd383a836d8d739e8acf"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShangtongZhang%2FDeepRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShangtongZhang%2FDeepRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShangtongZhang%2FDeepRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShangtongZhang%2FDeepRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShangtongZhang","download_url":"https://codeload.github.com/ShangtongZhang/DeepRL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248761659,"owners_count":21157595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","categorical-dqn","ddpg","deep-reinforcement-learning","deeprl","double-dqn","dqn","dueling-network-architecture","option-critic","option-critic-architecture","ppo","prioritized-experience-replay","pytorch","quantile-regression","rainbow","td3"],"created_at":"2024-09-24T20:30:48.124Z","updated_at":"2025-04-13T18:36:21.982Z","avatar_url":"https://github.com/ShangtongZhang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepRL\n\n\u003e If you have any question or want to report a bug, please open an issue instead of emailing me directly.  \n\nModularized implementation of popular deep RL algorithms in PyTorch.  \nEasy switch between toy tasks and challenging games.\n\nImplemented algorithms:\n* (Double/Dueling/Prioritized) Deep Q-Learning (DQN)\n* Categorical DQN (C51)\n* Quantile Regression DQN (QR-DQN)\n* (Continuous/Discrete) Synchronous Advantage Actor Critic (A2C)\n* Synchronous N-Step Q-Learning (N-Step DQN)\n* Deep Deterministic Policy Gradient (DDPG)\n* Proximal Policy Optimization (PPO)\n* The Option-Critic Architecture (OC)\n* Twined Delayed DDPG (TD3)\n* [Off-PAC-KL/TruncatedETD/DifferentialGQ/MVPI/ReverseRL/COF-PAC/GradientDICE/Bi-Res-DDPG/DAC/Geoff-PAC/QUOTA/ACE](#code-of-my-papers)\n\nThe DQN agent, as well as C51 and QR-DQN, has an asynchronous actor for data generation and an asynchronous replay buffer for transferring data to GPU.\nUsing 1 RTX 2080 Ti and 3 threads, the DQN agent runs for 10M steps (40M frames, 2.5M gradient updates) for Breakout within 6 hours.\n\n# Dependency\n* PyTorch v1.5.1\n* See ```Dockerfile``` and ```requirements.txt``` for more details\n\n# Usage\n\n```examples.py``` contains examples for all the implemented algorithms.  \n```Dockerfile``` contains the environment for generating the curves below.  \nPlease use this bibtex if you want to cite this repo\n```\n@misc{deeprl,\n  author = {Zhang, Shangtong},\n  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},\n  year = {2018},\n  publisher = {GitHub},\n  journal = {GitHub Repository},\n  howpublished = {\\url{https://github.com/ShangtongZhang/DeepRL}},\n}\n```\n\n# Curves (commit ```9e811e```)\n\n## BreakoutNoFrameskip-v4 (1 run)\n\n![Loading...](https://raw.githubusercontent.com/ShangtongZhang/DeepRL/master/images/Breakout.png)\n\n## Mujoco \n\n* DDPG/TD3 evaluation performance.\n![Loading...](https://raw.githubusercontent.com/ShangtongZhang/DeepRL/master/images/mujoco_eval.png)\n(5 runs, mean + standard error)\n\n* PPO online performance. \n![Loading...](https://raw.githubusercontent.com/ShangtongZhang/DeepRL/master/images/PPO.png)\n(5 runs, mean + standard error, smoothed by a window of size 10)\n\n\n# References\n* [Human Level Control through Deep Reinforcement Learning](https://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)\n* [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1602.01783)\n* [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461)\n* [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)\n* [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602)\n* [HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent](https://arxiv.org/abs/1106.5730)\n* [Deterministic Policy Gradient Algorithms](http://proceedings.mlr.press/v32/silver14.pdf)\n* [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)\n* [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438)\n* [Hybrid Reward Architecture for Reinforcement Learning](https://arxiv.org/abs/1706.04208)\n* [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477)\n* [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)\n* [Emergence of Locomotion Behaviours in Rich Environments](https://arxiv.org/abs/1707.02286)\n* [Action-Conditional Video Prediction using Deep Networks in Atari Games](https://arxiv.org/abs/1507.08750)\n* [A Distributional Perspective on Reinforcement Learning](https://arxiv.org/abs/1707.06887)\n* [Distributional Reinforcement Learning with Quantile Regression](https://arxiv.org/abs/1710.10044)\n* [The Option-Critic Architecture](https://arxiv.org/abs/1609.05140)\n* [Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477)\n* Some hyper-parameters are from [DeepMind Control Suite](https://arxiv.org/abs/1801.00690), [OpenAI Baselines](https://github.com/openai/baselines) and [Ilya Kostrikov](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr)\n\n# Code of My Papers\n\u003e They are located in other branches of this repo and seem to be good examples for using this codebase.\n* [Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch](https://arxiv.org/abs/2111.02997) [[Off-PAC-KL](https://github.com/ShangtongZhang/DeepRL/tree/Off-PAC-KL)]\n* [Truncated Emphatic Temporal Difference Methods for Prediction and Control](https://arxiv.org/abs/2108.05338) [[TruncatedETD](https://github.com/ShangtongZhang/DeepRL/tree/TruncatedETD)]\n* [A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms](https://arxiv.org/abs/2010.01069) [[Discounting](https://github.com/ShangtongZhang/DeepRL/tree/discounting)]\n* [Breaking the Deadly Triad with a Target Network](https://arxiv.org/abs/2101.08862) [[TargetNetwork](https://github.com/ShangtongZhang/DeepRL/tree/TargetNetwork)]\n* [Average-Reward Off-Policy Policy Evaluation with Function Approximation](https://arxiv.org/abs/2101.02808) [[DifferentialGQ](https://github.com/ShangtongZhang/DeepRL/tree/DifferentialGQ)]\n* [Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning](https://arxiv.org/abs/2004.10888) [[MVPI](https://github.com/ShangtongZhang/DeepRL/tree/MVPI)]\n* [Learning Retrospective Knowledge with Reverse Reinforcement Learning](https://arxiv.org/abs/2007.06703) [[ReverseRL](https://github.com/ShangtongZhang/DeepRL/tree/ReverseRL)]\n* [Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation](https://arxiv.org/abs/1911.04384) [[COF-PAC](https://github.com/ShangtongZhang/DeepRL/tree/COF-PAC), [TD3-random](https://github.com/ShangtongZhang/DeepRL/tree/TD3-random)]\n* [GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values](https://arxiv.org/abs/2001.11113) [[GradientDICE](https://github.com/ShangtongZhang/DeepRL/tree/GradientDICE)]\n* [Deep Residual Reinforcement Learning](https://arxiv.org/abs/1905.01072) [[Bi-Res-DDPG](https://github.com/ShangtongZhang/DeepRL/tree/Bi-Res-DDPG)]\n* [Generalized Off-Policy Actor-Critic](https://arxiv.org/abs/1903.11329) [[Geoff-PAC](https://github.com/ShangtongZhang/DeepRL/tree/Geoff-PAC), [TD3-random](https://github.com/ShangtongZhang/DeepRL/tree/TD3-random)]\n* [DAC: The Double Actor-Critic Architecture for Learning Options](https://arxiv.org/abs/1904.12691) [[DAC](https://github.com/ShangtongZhang/DeepRL/tree/DAC)]\n* [QUOTA: The Quantile Option Architecture for Reinforcement Learning](https://arxiv.org/abs/1811.02073) [[QUOTA-discrete](https://github.com/ShangtongZhang/DeepRL/tree/QUOTA-discrete), [QUOTA-continuous](https://github.com/ShangtongZhang/DeepRL/tree/QUOTA-continuous)]\n* [ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search](https://arxiv.org/abs/1811.02696) [[ACE](https://github.com/ShangtongZhang/DeepRL/tree/ACE)]\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshangtongzhang%2Fdeeprl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshangtongzhang%2Fdeeprl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshangtongzhang%2Fdeeprl/lists"}