{"id":13508171,"url":"https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow","last_synced_at":"2025-03-30T09:33:30.688Z","repository":{"id":37390891,"uuid":"90433420","full_name":"MorvanZhou/Reinforcement-learning-with-tensorflow","owner":"MorvanZhou","description":"Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学","archived":false,"fork":false,"pushed_at":"2024-03-31T05:40:51.000Z","size":438,"stargazers_count":9125,"open_issues_count":69,"forks_count":5036,"subscribers_count":290,"default_branch":"master","last_synced_at":"2025-03-26T01:07:02.545Z","etag":null,"topics":["a3c","actor-critic","asynchronous-advantage-actor-critic","ddpg","deep-deterministic-policy-gradient","deep-q-network","double-dqn","dqn","dueling-dqn","machine-learning","policy-gradient","ppo","prioritized-replay","proximal-policy-optimization","q-learning","reinforcement-learning","sarsa","sarsa-lambda","tensorflow-tutorials","tutorial"],"latest_commit_sha":null,"homepage":"https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MorvanZhou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-06T03:01:31.000Z","updated_at":"2025-03-25T11:25:29.000Z","dependencies_parsed_at":"2022-07-16T16:17:02.108Z","dependency_job_id":"733ee4d6-4924-4fd0-84cd-7256718156fd","html_url":"https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow","commit_stats":{"total_commits":105,"total_committers":11,"mean_commits":9.545454545454545,"dds":"0.33333333333333337","last_synced_commit":"93e333484be0e262d28b65507b6a1d002424a056"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FReinforcement-learning-with-tensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FReinforcement-learning-with-tensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FReinforcement-learning-with-tensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MorvanZhou%2FReinforcement-learning-with-tensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MorvanZhou","download_url":"https://codeload.github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246301963,"owners_count":20755512,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a3c","actor-critic","asynchronous-advantage-actor-critic","ddpg","deep-deterministic-policy-gradient","deep-q-network","double-dqn","dqn","dueling-dqn","machine-learning","policy-gradient","ppo","prioritized-replay","proximal-policy-optimization","q-learning","reinforcement-learning","sarsa","sarsa-lambda","tensorflow-tutorials","tutorial"],"created_at":"2024-08-01T02:00:49.256Z","updated_at":"2025-03-30T09:33:30.423Z","avatar_url":"https://github.com/MorvanZhou.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://www.youtube.com/watch?v=pieI7rOXELI\u0026list=PLXO45tsB95cIplu-fLMpUEEZTwrDNh6Ba\" target=\"_blank\"\u003e\n    \u003cimg width=\"60%\" src=\"https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/RL_cover.jpg\" style=\"max-width:100%;\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cbr\u003e\n\n# Reinforcement Learning Methods and Tutorials\n\nIn these tutorials for reinforcement learning, it covers from the basic RL algorithms to advanced algorithms developed recent years.\n\n**If you speak Chinese, visit [莫烦 Python](https://mofanpy.com) or my [Youtube channel](https://www.youtube.com/channel/UCdyjiB5H8Pu7aDTNVXTTpcg) for more.**\n\n**As many requests about making these tutorials available in English, please find them in this playlist:** ([https://www.youtube.com/playlist?list=PLXO45tsB95cIplu-fLMpUEEZTwrDNh6Ba](https://www.youtube.com/playlist?list=PLXO45tsB95cIplu-fLMpUEEZTwrDNh6Ba))\n\n# Table of Contents\n\n* Tutorials\n    * [Simple entry example](contents/1_command_line_reinforcement_learning)\n    * [Q-learning](contents/2_Q_Learning_maze)\n    * [Sarsa](contents/3_Sarsa_maze)\n    * [Sarsa(lambda)](contents/4_Sarsa_lambda_maze)\n    * [Deep Q Network (DQN)](contents/5_Deep_Q_Network)\n    * [Using OpenAI Gym](contents/6_OpenAI_gym)\n    * [Double DQN](contents/5.1_Double_DQN)\n    * [DQN with Prioitized Experience Replay](contents/5.2_Prioritized_Replay_DQN)\n    * [Dueling DQN](contents/5.3_Dueling_DQN)\n    * [Policy Gradients](contents/7_Policy_gradient_softmax)\n    * [Actor-Critic](contents/8_Actor_Critic_Advantage)\n    * [Deep Deterministic Policy Gradient (DDPG)](contents/9_Deep_Deterministic_Policy_Gradient_DDPG)\n    * [A3C](contents/10_A3C)\n    * [Dyna-Q](contents/11_Dyna_Q)\n    * [Proximal Policy Optimization (PPO)](contents/12_Proximal_Policy_Optimization)\n    * [Curiosity Model](/contents/Curiosity_Model), [Random Network Distillation (RND)](/contents/Curiosity_Model/Random_Network_Distillation.py)\n* [Some of my experiments](experiments)\n    * [2D Car](experiments/2D_car)\n    * [Robot arm](experiments/Robot_arm)\n    * [BipedalWalker](experiments/Solve_BipedalWalker)\n    * [LunarLander](experiments/Solve_LunarLander)\n\n# Some RL Networks\n### [Deep Q Network](contents/5_Deep_Q_Network)\n\n\u003ca href=\"contents/5_Deep_Q_Network\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/4-3-2.png\"\u003e\n\u003c/a\u003e\n\n### [Double DQN](contents/5.1_Double_DQN)\n\n\u003ca href=\"contents/5.1_Double_DQN\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/4-5-3.png\"\u003e\n\u003c/a\u003e\n\n### [Dueling DQN](contents/5.3_Dueling_DQN)\n\n\u003ca href=\"contents/5.3_Dueling_DQN\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/4-7-4.png\"\u003e\n\u003c/a\u003e\n\n### [Actor Critic](contents/8_Actor_Critic_Advantage)\n\n\u003ca href=\"contents/8_Actor_Critic_Advantage\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/6-1-1.png\"\u003e\n\u003c/a\u003e\n\n### [Deep Deterministic Policy Gradient](contents/9_Deep_Deterministic_Policy_Gradient_DDPG)\n\n\u003ca href=\"contents/9_Deep_Deterministic_Policy_Gradient_DDPG\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/6-2-2.png\"\u003e\n\u003c/a\u003e\n\n### [A3C](contents/10_A3C)\n\n\u003ca href=\"contents/10_A3C\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/6-3-2.png\"\u003e\n\u003c/a\u003e\n\n### [Proximal Policy Optimization (PPO)](contents/12_Proximal_Policy_Optimization)\n\n\u003ca href=\"contents/12_Proximal_Policy_Optimization\"\u003e\n    \u003cimg class=\"course-image\" src=\"https://mofanpy.com/static/results/reinforcement-learning/6-4-3.png\"\u003e\n\u003c/a\u003e\n\n### [Curiosity Model](/contents/Curiosity_Model)\n\n\u003ca href=\"/contents/Curiosity_Model\"\u003e\n    \u003cimg class=\"course-image\" src=\"/contents/Curiosity_Model/Curiosity.png\"\u003e\n\u003c/a\u003e\n\n# Donation\n\n*If this does help you, please consider donating to support me for better tutorials. Any contribution is greatly appreciated!*\n\n\u003cdiv \u003e\n  \u003ca href=\"https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026amp;business=morvanzhou%40gmail%2ecom\u0026amp;lc=C2\u0026amp;item_name=MorvanPython\u0026amp;currency_code=AUD\u0026amp;bn=PP%2dDonationsBF%3abtn_donateCC_LG%2egif%3aNonHosted\"\u003e\n    \u003cimg style=\"border-radius: 20px;  box-shadow: 0px 0px 10px 1px  #888888;\"\n         src=\"https://www.paypalobjects.com/webstatic/en_US/i/btn/png/silver-pill-paypal-44px.png\"\n         alt=\"Paypal\"\n         height=\"auto\" \u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv\u003e\n  \u003ca href=\"https://www.patreon.com/morvan\"\u003e\n    \u003cimg src=\"https://mofanpy.com/static/img/support/patreon.jpg\"\n         alt=\"Patreon\"\n         height=120\u003e\u003c/a\u003e\n\u003c/div\u003e\n","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026amp;business=morvanzhou%40gmail%2ecom\u0026amp;lc=C2\u0026amp;item_name=MorvanPython\u0026amp;currency_code=AUD\u0026amp;bn=PP%2dDonationsBF%3abtn_donateCC_LG%2egif%3aNonHosted","https://www.patreon.com/morvan"],"categories":["Python","Tutorials","Table of Contents","Machine Learning Tutorials"],"sub_categories":["ML","Data Management"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMorvanZhou%2FReinforcement-learning-with-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMorvanZhou%2FReinforcement-learning-with-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMorvanZhou%2FReinforcement-learning-with-tensorflow/lists"}