{"id":18594480,"url":"https://github.com/applenob/rl_learn","last_synced_at":"2025-04-06T11:08:34.912Z","repository":{"id":60192884,"uuid":"104422780","full_name":"applenob/rl_learn","owner":"applenob","description":"我的强化学习笔记和学习材料:book:  still updating ... ...","archived":false,"fork":false,"pushed_at":"2019-06-09T14:29:30.000Z","size":78254,"stargazers_count":343,"open_issues_count":0,"forks_count":117,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-30T10:06:44.572Z","etag":null,"topics":["learning-by-doing","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/applenob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-22T02:38:40.000Z","updated_at":"2025-03-21T07:22:31.000Z","dependencies_parsed_at":"2022-09-26T14:50:26.077Z","dependency_job_id":null,"html_url":"https://github.com/applenob/rl_learn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/applenob%2Frl_learn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/applenob%2Frl_learn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/applenob%2Frl_learn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/applenob%2Frl_learn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/applenob","download_url":"https://codeload.github.com/applenob/rl_learn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247471518,"owners_count":20944158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["learning-by-doing","reinforcement-learning"],"created_at":"2024-11-07T01:15:41.585Z","updated_at":"2025-04-06T11:08:29.903Z","avatar_url":"https://github.com/applenob.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [WIP]强化学习的学习仓库\n\n这是我个人学习**强化学习**的时候收集的比较经典的学习资料、笔记和代码，分享给所有人。\n\n为了直接在GitHub上用markdown文件看公式，推荐安装chrome插件：[MathJax Plugin for Github](https://chrome.google.com/webstore/detail/mathjax-plugin-for-github/ioemnmodlmafdkllaclgeombjnmnbima)\n\n## 入门指南\n\n- [入门指南](learning_route.md)\n\n## 课程笔记\n\n- [David Silver 的 Reinforcement Learning 课程学习笔记。](class_note.ipynb)\n- [课程对应的所有PPT](slides)\n- Sutton 的 Reinforcement Learning: An Introduction书本学习笔记\n  - [1. Introduction](notes/intro_note_01.md)\n  - [2. Multi-armed Bandits](notes/intro_note_02.md)\n  - [3. Finite Markov DecisionProcesses](notes/intro_note_03.md)\n  - [4. Dynamic Programming](notes/intro_note_04.md)\n  - [5. Monte Carlo Methods](notes/intro_note_05.md)\n  - [6. Temporal-Difference Learning](notes/intro_note_06.md)\n  - [7. n-step Bootstrapping](notes/intro_note_07.md)\n  - [8. Planning and Learning with Tabular Methods](notes/intro_note_08.md)\n  - [9. On-policy Prediction with Approximation](notes/intro_note_09.md)\n  - [10. On-policy Control with Approximation](notes/intro_note_10.md)\n  - [11. Off-policy Methods with Approximation](notes/intro_note_11.md)\n  - [12. Eligibility Traces](notes/intro_note_12.md)\n  - [13. Policy Gradient Methods](notes/intro_note_13.md)\n  - [14. Psychology](notes/intro_note_14.md)\n  - [15. Neuroscience](notes/intro_note_15.md)\n  - [16. Applications and Case Studies](notes/intro_note_16.md)\n  - [17. Frontiers](notes/intro_note_17.md)\n\n- [书本的各版本pdf](book)\n  - [2017-6 draft](book/bookdraft2017june19.pdf)\n  - [2018 second edition](book/bookdraft2018.pdf)\n\n## 实验目录\n\n所有的实验源代码都在`lib`目录下，来自[dennybritz](https://github.com/dennybritz/reinforcement-learning)。在原先代码的基础上，增加了对实验背景的具体介绍、代码和公式的对照。\n\n- [Gridworld](exp/1_gridworld.ipynb)：对应**MDP**的**Dynamic Programming**\n- [Blackjack](exp/2_blackjack.ipynb)：对应**Model Free**的**Monte Carlo**的Planning和Controlling\n- [Windy Gridworld](exp/3_windy_gridworld.ipynb)：对应**Model Free**的**Temporal Difference**的**On-Policy Controlling**：**SARSA**。\n- [Cliff Walking](exp/4_cliff_walking.ipynb)：对应**Model Free**的**Temporal Difference**的**Off-Policy Controlling**：**Q-learning**。\n- [Mountain Car](exp/5_mountain_car.ipynb)：对应Q表格很大无法处理（state空间连续）的**Q-Learning with Linear Function Approximation**。\n- [Atari](exp/6_atari.ipynb)：对应**Deep-Q Learning**。\n\n## 其他重要学习资料：\n\n- [WildML的博客](http://www.wildml.com/2016/10/learning-reinforcement-learning/)\n- [David Silver’s Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)\n- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/the-book-2nd.html)\n- [书本的python代码实现](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapplenob%2Frl_learn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapplenob%2Frl_learn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapplenob%2Frl_learn/lists"}