{"id":21514142,"url":"https://github.com/allenpandas/tutorial4rl","last_synced_at":"2025-03-17T15:47:17.684Z","repository":{"id":153304652,"uuid":"586177092","full_name":"Allenpandas/Tutorial4RL","owner":"Allenpandas","description":"Tutorial4RL: Tutorial for Reinforcement Learning. 强化学习入门教程.","archived":false,"fork":false,"pushed_at":"2024-03-27T01:34:54.000Z","size":4376,"stargazers_count":128,"open_issues_count":0,"forks_count":12,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-01-24T02:19:44.124Z","etag":null,"topics":["a3c","ddpg","deep-reinforcement-learning","dqn","inverse-reinforcement-learning","multi-agent-reinforcement-learning","multi-agent-systems","policy-gradient","qlearning","reinforcement-learning","reinforcementlearning-tutorial","rl-tutorial","sarsa","tutorial"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Allenpandas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-07T07:52:38.000Z","updated_at":"2025-01-20T15:36:54.000Z","dependencies_parsed_at":"2024-02-02T06:48:56.037Z","dependency_job_id":null,"html_url":"https://github.com/Allenpandas/Tutorial4RL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Allenpandas%2FTutorial4RL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Allenpandas%2FTutorial4RL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Allenpandas%2FTutorial4RL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Allenpandas%2FTutorial4RL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Allenpandas","download_url":"https://codeload.github.com/Allenpandas/Tutorial4RL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244060816,"owners_count":20391604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a3c","ddpg","deep-reinforcement-learning","dqn","inverse-reinforcement-learning","multi-agent-reinforcement-learning","multi-agent-systems","policy-gradient","qlearning","reinforcement-learning","reinforcementlearning-tutorial","rl-tutorial","sarsa","tutorial"],"created_at":"2024-11-23T23:42:11.019Z","updated_at":"2025-03-17T15:47:17.664Z","avatar_url":"https://github.com/Allenpandas.png","language":null,"readme":"# Tutorial4RL\nTutorial4RL: Tutorial for Reinforcement Learning. 强化学习入门教程.\n\n## Related Repository\n\n| Repository                                                   | Remark                                                       |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [Awesome-Reinforcement-Learning-Papers](https://github.com/Allenpandas/Awesome-Reinforcement-Learning-Papers) | \u003ca href=\"https://github.com/Allenpandas/Awesome-Reinforcement-Learning-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/Awesome-Reinforcement-Learning-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/Awesome-Reinforcement-Learning-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/Awesome-Reinforcement-Learning-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/Awesome-Reinforcement-Learning-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/Awesome-Reinforcement-Learning-Papers\"\u003e\u003c/a\u003e |\n| [Tutorial4RL](https://github.com/Allenpandas/Tutorial4RL)    | \u003ca href=\"https://github.com/Allenpandas/Tutorial4RL\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/Tutorial4RL\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/Tutorial4RL\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/Tutorial4RL\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/Tutorial4RL\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/Tutorial4RL\"\u003e\u003c/a\u003e |\n| [2023-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2023-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n| [2022-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2022-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n| [2021-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2021-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n| [2020-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2020-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n| [2019-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2019-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n| [2018-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2018-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n| [2017-Reinforcement-Learning-Conferences-Papers](https://github.com/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers) | \u003ca href=\"https://github.com/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub repo size\" src=\"https://img.shields.io/github/repo-size/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers\"\u003e\u003cimg alt=\"GitHub last commit (by committer)\" src=\"https://img.shields.io/github/last-commit/Allenpandas/2017-Reinforcement-Learning-Conferences-Papers\"\u003e\u003c/a\u003e |\n\n\n\n## Open Source Projects\n\n- PFRL：基于Pytorch的深度强化学习库： [https://github.com/pfnet/pfrl](https://github.com/pfnet/pfrl)\n- 莫烦强化学习TensorFlow代码： [https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow](https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow)\n- 百度飞桨PaddlePaddle强化学习代码： [https://github.com/PaddlePaddle/PARL](https://github.com/PaddlePaddle/PARL)\n- Github强大的强化学习库： [https://github.com/wwxFromTju/awesome-reinforcement-learning-lib](https://github.com/wwxFromTju/awesome-reinforcement-learning-lib)\n- 优达学城（在线教育平台）强化学习库： [https://github.com/udacity/deep-reinforcement-learning](https://github.com/udacity/deep-reinforcement-learning)\n\n\n\n## Books \u0026 Videos\n\n- 《深度强化学习》王树森： [https://www.bilibili.com/video/BV12o4y197US](https://www.bilibili.com/video/BV12o4y197US)\n- 《Deep Reinforcement Learning》李宏毅： [https://www.bilibili.com/video/BV1UE411G78S](https://www.bilibili.com/video/BV1UE411G78S)\n- 《世界冠军带你从零实践强化学习》百度飞桨团队： [https://www.bilibili.com/video/BV1yv411i7xd](https://www.bilibili.com/video/BV1yv411i7xd)\n- 《强化学习白板推导》：[https://space.bilibili.com/97068901/channel/seriesdetail?sid=594040](https://space.bilibili.com/97068901/channel/seriesdetail?sid=594040)\n- 《蘑菇书EasyRL》王琦等： [https://github.com/datawhalechina/easy-rl](https://github.com/datawhalechina/easy-rl)\n- 《动手学强化学习》张伟楠等： [http://hrl.boyuai.com/](http://hrl.boyuai.com/)\n\n\n\n## Relevant Conferences\n\n| Abbr.   | Full Name                                                    | CCF Rank |\n| ------- | ------------------------------------------------------------ | :------: |\n| ICML    | International Conference on Machine Learning                 |  CCF-A   |\n| NeurIPS | Annual Conference on Neural Information Processing Systems   |  CCF-A   |\n| ICLR    | International Conference on Learning Representations         |    —     |\n| AAAI    | AAAI Conference on Artificial Intelligence                   |  CCF-A   |\n| IJCAI   | International Joint Conference on Artificial Intelligence    |  CCF-A   |\n| AAMAS   | International Joint Conference on Autonomous Agents and Multi-agent Systems |  CCF-B   |\n| ICRA    | IEEE International Conference on Robotics and Automation     |  CCF-B   |\n\n\n\n## Community\n\n- RLChina强化学习社区： [http://rlchina.org/](http://rlchina.org/)\n- 智源社区强化学习专栏： [https://hub.baai.ac.cn/?tag_id=74](https://hub.baai.ac.cn/?tag_id=74)\n- 智源社区强化学习周刊： [https://hub.baai.ac.cn/users/18447](https://hub.baai.ac.cn/users/18447)\n\n\n\n## Langya Rank\n\n### Domestic Langya Rank\n\n| Name   | Organization       | Link                                                      | Focus                                    |\n| ------ | ------------------ | --------------------------------------------------------- | ---------------------------------------- |\n| 郝建业 | 天津大学           | [[HomePage](http://www.icdai.org/jianye.html)]            | 多智能体强化学习、博弈论                 |\n| 张海峰 | 中科院自动化所     | [[HomePage](https://people.ucas.edu.cn/~zhf)]             | 多智能体强化学习、智能体博弈、智能体评估 |\n| 罗军   | 华为诺亚方舟实验室 | [[HomePage](https://openreview.net/profile?id=~Jun_Luo1)] | 自动驾驶、强化学习                       |\n| 王祥丰 | 华东师范大学       | [[HomePage](https://mail-ecnu.cn/people/xfwang)]          | 多智体强化学习                           |\n| 俞扬   | 南京大学           | [[HomePage](https://www.yuque.com/eyounx/home)]           | 强化学习、离线强化学习                   |\n| 杨耀东 | 北京大学           | [[HomePage](https://www.yangyaodong.com/)]                | 多智能体强化学习、博弈论                 |\n| 卢宗青 | 北京大学           | [[HomePage](https://z0ngqing.github.io/)]                 | 强化学习                                 |\n| 张崇洁 | 清华大学           | [[HomePage](http://people.iiis.tsinghua.edu.cn/~zhang/)]  | 深度强化学习、多智能体                   |\n\n\n\n### Abroad Langya Rank\n\n| Name                 | Organization                                                 |                             Link                             |\n| -------------------- | ------------------------------------------------------------ | :----------------------------------------------------------: |\n| Sergey Levine        | UC Berkeley                                                  | [[Google Scholar](https://scholar.google.com/citations?user=8R35rCwAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Piter Abbeel         | UC Berkeley                                                  | [[Google Scholar](https://scholar.google.com/citations?user=vtwH6GkAAAAJ\u0026hl=zh-CN)] |\n| Matthew E. Taylor    | University of Alberta                                        | [[Google Scholar](https://scholar.google.com/citations?user=edQgLXcAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Peter Stone          | University of Texas at Austin                                | [[Google Scholar](https://scholar.google.com/citations?user=qnwjcfAAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Shimon Whiteson      | University of Oxford / Waymo                                 | [[Google Scholar](https://scholar.google.com/citations?user=9zeEI-cAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Jan Peters           | German AI Research Center                                    | [[Google Scholar](https://scholar.google.com/citations?user=-kIVAcAAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Shie Mannor          | Nvidia                                                       | [[Google Scholar](https://scholar.google.com/citations?user=q1HlbIUAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Chelsea Finn         | Stanford University / Google                                 | [[Google Scholar](https://scholar.google.com/citations?user=vfPE6hgAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Dusit Niyato         |                                                              |                       [Google Scholar]                       |\n| Doina Precup         | DeepMind / McGill University                                 | [[Google Scholar](https://scholar.google.com/citations?user=j54VcVEAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Ann Nowé             |                                                              | [[Google Scholar](https://scholar.google.com/citations?user=LH5QKbgAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Marcello Restelli    | Politecnico di Milano                                        | [[Google Scholar](https://scholar.google.com/citations?user=xdgxRiEAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Frank L. Lewis       |                                                              |                       [Google Scholar]                       |\n| H. Vincent Poor      |                                                              |                       [Google Scholar]                       |\n| Vaneet Aggarwal      | Purdue University                                            | [[Google Scholar](https://scholar.google.com/citations?user=Tu4lmGwAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| F. Richard Yu        | Carleton University                                          | [[Google Scholar](https://scholar.google.com/citations?user=zuGMGBoAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Jun Wang             | University College London                                    | [[Google Scholar](https://scholar.google.com/citations?hl=zh-CN\u0026user=wIE1tY4AAAAJ)] |\n| Michael L. Littman   |                                                              |                       [Google Scholar]                       |\n| Satinder Singh       | University of Michigan                                       |                       [Google Scholar]                       |\n| Mehdi Bennis         |                                                              |                       [Google Scholar]                       |\n| David Silver         | University College London / DeepMind                         |                       [Google Scholar]                       |\n| Rémi Munos           |                                                              |                       [Google Scholar]                       |\n| Marc G. Bellemare    |                                                              |                       [Google Scholar]                       |\n| Joelle Pineau        | McGill University / Meta AI                                  | [[Google Scholar](https://scholar.google.com/citations?user=CEt6_mMAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Martin A. Riedmiller | Google                                                       | [[Google Scholar](https://scholar.google.com/citations?hl=zh-CN\u0026user=1gVfqpcAAAAJ\u0026view_op=list_works\u0026sortby=pubdate)] |\n| Mohsen Guizani       | Mohamed Bin Zayed University of Artificial Intelligence      | [[Google Scholar]](https://scholar.google.com/citations?hl=zh-CN\u0026user=RigrYkcAAAAJ\u0026view_op=list_works\u0026sortby=pubdate) |\n| Stefan Wermter       | University of Hamburg                                        | [[Google Scholar](https://scholar.google.com/citations?user=uIeaxuAAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Ying-Chang Liang     |                                                              | [[Google Scholar](https://scholar.google.com/citations?user=HybIiJ8AAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Jonathan P. How      |                                                              |                       [Google Scholar]                       |\n| Ivana Dusparic       | Trinity College Dublin                                       | [[Google Scholar](https://scholar.google.com/citations?user=CrGGAccAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Robert Babuska       | Delft University of Technology / Czech Technical University Prague | [[Google Scholar](https://scholar.google.com/citations?user=0orN2FUAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Emma Brunskill       | Stanford University                                          | [[Google Scholar](https://scholar.google.com/citations?user=HaN8b2YAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n| Bo An                | Nanyang Technological University                             | [[Google Scholar](https://scholar.google.com/citations?user=PEEpuNwAAAAJ\u0026hl=zh-CN\u0026oi=ao)] |\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallenpandas%2Ftutorial4rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallenpandas%2Ftutorial4rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallenpandas%2Ftutorial4rl/lists"}