{"id":13444445,"url":"https://github.com/dennybritz/reinforcement-learning","last_synced_at":"2025-05-11T05:45:49.141Z","repository":{"id":37385064,"uuid":"66483240","full_name":"dennybritz/reinforcement-learning","owner":"dennybritz","description":"Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.","archived":false,"fork":false,"pushed_at":"2023-07-13T09:52:27.000Z","size":5375,"stargazers_count":21250,"open_issues_count":114,"forks_count":6116,"subscribers_count":864,"default_branch":"master","last_synced_at":"2025-05-11T05:45:45.125Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://www.wildml.com/2016/10/learning-reinforcement-learning/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dennybritz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-08-24T17:02:41.000Z","updated_at":"2025-05-11T04:44:11.000Z","dependencies_parsed_at":"2023-10-20T20:00:17.847Z","dependency_job_id":null,"html_url":"https://github.com/dennybritz/reinforcement-learning","commit_stats":{"total_commits":196,"total_committers":47,"mean_commits":4.170212765957447,"dds":"0.45408163265306123","last_synced_commit":"2b832284894a65eccdd82353cc446f68d100676e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dennybritz%2Freinforcement-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dennybritz%2Freinforcement-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dennybritz%2Freinforcement-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dennybritz%2Freinforcement-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dennybritz","download_url":"https://codeload.github.com/dennybritz/reinforcement-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253523720,"owners_count":21921818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T04:00:23.288Z","updated_at":"2025-05-11T05:45:49.111Z","avatar_url":"https://github.com/dennybritz.png","language":"Jupyter Notebook","funding_links":[],"categories":["Uncategorized","Tutorials","Jupyter Notebook","Table of Contents","RL","强化学习实战资源","时间序列","Reinforcement Learning","Machine Learning","Coding \u0026 Development"],"sub_categories":["Uncategorized","Implementation of Algorithms","网络服务_其他","Ukraine","Courses"],"readme":"### Overview\n\nThis repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from\n\n- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/RLbook2018.pdf)\n- [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)\n\nEach folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.\n\nAll code is written in Python 3 and uses RL environments from [OpenAI Gym](https://gym.openai.com/). Advanced techniques use [Tensorflow](https://www.tensorflow.org/) for neural network implementations.\n\n\n### Table of Contents\n\n- [Introduction to RL problems \u0026 OpenAI Gym](Introduction/)\n- [MDPs and Bellman Equations](MDP/)\n- [Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration](DP/)\n- [Monte Carlo Model-Free Prediction \u0026 Control](MC/)\n- [Temporal Difference Model-Free Prediction \u0026 Control](TD/)\n- [Function Approximation](FA/)\n- [Deep Q Learning](DQN/) (WIP)\n- [Policy Gradient Methods](PolicyGradient/) (WIP)\n- Learning and Planning (WIP)\n- Exploration and Exploitation (WIP)\n\n\n### List of Implemented Algorithms\n\n- [Dynamic Programming Policy Evaluation](DP/Policy%20Evaluation%20Solution.ipynb)\n- [Dynamic Programming Policy Iteration](DP/Policy%20Iteration%20Solution.ipynb)\n- [Dynamic Programming Value Iteration](DP/Value%20Iteration%20Solution.ipynb)\n- [Monte Carlo Prediction](MC/MC%20Prediction%20Solution.ipynb)\n- [Monte Carlo Control with Epsilon-Greedy Policies](MC/MC%20Control%20with%20Epsilon-Greedy%20Policies%20Solution.ipynb)\n- [Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy%20MC%20Control%20with%20Weighted%20Importance%20Sampling%20Solution.ipynb)\n- [SARSA (On Policy TD Learning)](TD/SARSA%20Solution.ipynb)\n- [Q-Learning (Off Policy TD Learning)](TD/Q-Learning%20Solution.ipynb)\n- [Q-Learning with Linear Function Approximation](FA/Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb)\n- [Deep Q-Learning for Atari Games](DQN/Deep%20Q%20Learning%20Solution.ipynb)\n- [Double Deep-Q Learning for Atari Games](DQN/Double%20DQN%20Solution.ipynb)\n- Deep Q-Learning with Prioritized Experience Replay (WIP)\n- [Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk%20REINFORCE%20with%20Baseline%20Solution.ipynb)\n- [Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk%20Actor%20Critic%20Solution.ipynb)\n- [Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces](PolicyGradient/Continuous%20MountainCar%20Actor%20Critic%20Solution.ipynb)\n- Deterministic Policy Gradients for Continuous Action Spaces (WIP)\n- Deep Deterministic Policy Gradients (DDPG) (WIP)\n- [Asynchronous Advantage Actor Critic (A3C)](PolicyGradient/a3c)\n\n\n### Resources\n\nTextbooks:\n\n- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/RLbook2018.pdf)\n\nClasses:\n\n- [David Silver's Reinforcement Learning Course (UCL, 2015)](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)\n- [CS294 - Deep Reinforcement Learning (Berkeley, Fall 2015)](http://rll.berkeley.edu/deeprlcourse/)\n- [CS 8803 - Reinforcement Learning (Georgia Tech)](https://www.udacity.com/course/reinforcement-learning--ud600)\n- [CS885 - Reinforcement Learning (UWaterloo), Spring 2018](https://cs.uwaterloo.ca/~ppoupart/teaching/cs885-spring18/)\n- [CS294-112 - Deep Reinforcement Learning (UC Berkeley)](http://rail.eecs.berkeley.edu/deeprlcourse/)\n\nTalks/Tutorials:\n\n- [Introduction to Reinforcement Learning (Joelle Pineau @ Deep Learning Summer School 2016)](http://videolectures.net/deeplearning2016_pineau_reinforcement_learning/)\n- [Deep Reinforcement Learning (Pieter Abbeel @ Deep Learning Summer School 2016)](http://videolectures.net/deeplearning2016_abbeel_deep_reinforcement/)\n- [Deep Reinforcement Learning ICML 2016 Tutorial (David Silver)](http://techtalks.tv/talks/deep-reinforcement-learning/62360/)\n- [Tutorial: Introduction to Reinforcement Learning with Function Approximation](https://www.youtube.com/watch?v=ggqnxyjaKe4)\n- [John Schulman - Deep Reinforcement Learning (4 Lectures)](https://www.youtube.com/playlist?list=PLjKEIQlKCTZYN3CYBlj8r58SbNorobqcp)\n- [Deep Reinforcement Learning Slides @ NIPS 2016](http://people.eecs.berkeley.edu/~pabbeel/nips-tutorial-policy-optimization-Schulman-Abbeel.pdf)\n- [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/user/introduction.html)\n- [Advanced Deep Learning \u0026 Reinforcement Learning (UCL 2018, DeepMind)](https://www.youtube.com/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)\n-[Deep RL Bootcamp](https://sites.google.com/view/deep-rl-bootcamp/lectures)\n\nOther Projects:\n\n- [carpedm20/deep-rl-tensorflow](https://github.com/carpedm20/deep-rl-tensorflow)\n- [matthiasplappert/keras-rl](https://github.com/matthiasplappert/keras-rl)\n\nSelected Papers:\n\n- [Human-Level Control through Deep Reinforcement Learning (2015-02)](http://www.readcube.com/articles/10.1038/nature14236)\n- [Deep Reinforcement Learning with Double Q-learning (2015-09)](http://arxiv.org/abs/1509.06461)\n- [Continuous control with deep reinforcement learning (2015-09)](https://arxiv.org/abs/1509.02971)\n- [Prioritized Experience Replay (2015-11)](http://arxiv.org/abs/1511.05952)\n- [Dueling Network Architectures for Deep Reinforcement Learning (2015-11)](http://arxiv.org/abs/1511.06581)\n- [Asynchronous Methods for Deep Reinforcement Learning (2016-02)](http://arxiv.org/abs/1602.01783)\n- [Deep Reinforcement Learning from Self-Play in Imperfect-Information Games (2016-03)](http://arxiv.org/abs/1603.01121)\n- [Mastering the game of Go with deep neural networks and tree search](https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdennybritz%2Freinforcement-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdennybritz%2Freinforcement-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdennybritz%2Freinforcement-learning/lists"}