{"id":26659376,"url":"https://github.com/sshkhr/practical_rl","last_synced_at":"2025-04-11T14:41:11.953Z","repository":{"id":45305132,"uuid":"132923061","full_name":"sshkhr/Practical_RL","owner":"sshkhr","description":"My solutions to Yandex Practical Reinforcement Learning course in PyTorch and Tensorflow ","archived":false,"fork":false,"pushed_at":"2021-12-22T20:23:43.000Z","size":10396,"stargazers_count":54,"open_issues_count":2,"forks_count":25,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-25T10:45:09.853Z","etag":null,"topics":["bandit-algorithms","deep-reinforcement-learning","evolutionary-algorithms","markov-decision-processes","monte-carlo-sampling","policy-gradient","pytorch","reinforcement-learning","td-learning","tensorflow"],"latest_commit_sha":null,"homepage":"https://github.com/yandexdataschool/Practical_RL","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sshkhr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-10T15:56:20.000Z","updated_at":"2025-03-14T02:01:05.000Z","dependencies_parsed_at":"2022-08-04T13:30:35.189Z","dependency_job_id":null,"html_url":"https://github.com/sshkhr/Practical_RL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshkhr%2FPractical_RL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshkhr%2FPractical_RL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshkhr%2FPractical_RL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshkhr%2FPractical_RL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sshkhr","download_url":"https://codeload.github.com/sshkhr/Practical_RL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248420166,"owners_count":21100324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bandit-algorithms","deep-reinforcement-learning","evolutionary-algorithms","markov-decision-processes","monte-carlo-sampling","policy-gradient","pytorch","reinforcement-learning","td-learning","tensorflow"],"created_at":"2025-03-25T10:45:14.664Z","updated_at":"2025-04-11T14:41:11.934Z","avatar_url":"https://github.com/sshkhr.png","language":"Jupyter Notebook","readme":"# Practical_RL\nA course on reinforcement learning in the wild.\nTaught on-campus at [HSE](https://cs.hse.ru) and [YSDA](https://yandexdataschool.com/)  and maintained to be friendly to online students (both english and russian).\n\n\n#### Manifesto:\n* __Optimize for the curious.__ For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.\n* __Practicality first.__ Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.\n* __Git-course.__ Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! [Pull-request](https://help.github.com/articles/about-pull-requests/) it!\n\n# Course info\n* Lecture slides are [here](https://yadi.sk/d/loPpY45J3EAYfU).\n* Telegram chat room for YSDA \u0026 HSE students is [here](https://t.me/rlspring18)\n* Grading rules for YSDA \u0026 HSE students is [here](https://github.com/yandexdataschool/Practical_RL/wiki/Homeworks-and-grading)\n* Online student __[survival guide](https://github.com/yandexdataschool/Practical_RL/wiki/Online-student's-survival-guide)__\n* Installing the libraries - [guide and issues thread](https://github.com/yandexdataschool/Practical_RL/issues/1)\n* Magical button that launches you into course environment: \n    * [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/yandexdataschool/Practical_RL/master) - comes will all libraries pre-installed. May be down time to time.\n    * If it's down, try [__google colab__](https://colab.research.google.com/) or [__azure notebooks__](http://notebooks.azure.com/). Those last longer, but they will require you to run installer commands (see ./Dockerfile).\n* Anonymous [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSdurWw97Sm9xCyYwC8g3iB5EibITnoPJW2IkOVQYE_kcXPh6Q/viewform) for everything that didn't go through e-mail.\n* [About the course](https://github.com/yandexdataschool/Practical_RL/wiki/Practical-RL)\n\n# Additional materials\n* A large list of RL materials - [awesome rl](https://github.com/aikorea/awesome-rl)\n* [RL reading group](https://github.com/yandexdataschool/Practical_RL/wiki/RL-reading-group)\n\n\n# Syllabus\n\nThe syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.\n\n* [__week1__](https://github.com/yandexdataschool/Practical_RL/tree/master/week1_intro) RL as blackbox optimization\n  * Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.\n  * Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments.\n  * Homework description - see week1/README.md. \n  * ** YSDA Deadline: 2018.02.26 23.59**\n  * ** HSE Deadline: 2018.01.28 23:59**\n  \n* [__week2__](https://github.com/yandexdataschool/Practical_RL/tree/master/week2_value_based) Value-based methods\n  * Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.\n  * Seminar: Value iteration.  \n  * Homework description - see week2/README.md. \n  * ** HSE Deadline: 2018.02.11 23:59**\n  * ** YSDA Deadline: part1 2018.03.05 23.59, part2 2018.03.12 23.59**\n  \n\n* [__week3__](https://github.com/yandexdataschool/Practical_RL/tree/master/week3_model_free) Model-free reinforcement learning\n  * Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).\n  * Seminar: Qlearning Vs SARSA Vs Expected Value SARSA\n  * Homework description - see week3/README.md. \n  * **HSE Deadline: 2018.02.15 23:59**\n  * ** YSDA Deadline: 2018.03.12 23.59**\n     \n* [__week4_recap__](https://github.com/yandexdataschool/Practical_RL/tree/master/week4_%5Brecap%5D_deep_learning) - deep learning recap \n  * Lecture: Deep learning 101\n  * Seminar: Simple image classification with convnets\n\n* [__week4__](https://github.com/yandexdataschool/Practical_RL/tree/master/week4_approx_rl) Approximate reinforcement learning\n  * Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.\n  * Seminar:  Approximate Q-learning with experience replay. (CartPole, Atari)\n  * **HSE Deadline: 2018.03.04 23:30**\n  * ** YSDA Deadline: 2018.03.20 23.30**\n\n* [__week5__](https://github.com/yandexdataschool/Practical_RL/tree/master/week5_explore) Exploration in reinforcement learning\n  * Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. \"Deep\" heuristics for exploration.\n  * Seminar: bayesian exploration for contextual bandits. UCB for MCTS.\n  \n  * ** YSDA Deadline: 2018.03.30 23.30**\n\n* [__week6__](https://github.com/yandexdataschool/Practical_RL/tree/master/week6_policy_based) Policy gradient methods I\n  * Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE)\n  * Seminar: REINFORCE, advantage actor-critic\n\n* [__week7_recap__](https://github.com/yandexdataschool/Practical_RL/tree/master/week7_%5Brecap%5D_rnn) Recurrent neural networks recap\n  * Lecture: Problems with sequential data. Recurrent neural netowks. Backprop through time. Vanishing \u0026 exploding gradients. LSTM, GRU. Gradient clipping\n  * Seminar: character-level RNN language model\n\n* [__week7__](https://github.com/yandexdataschool/Practical_RL/tree/master/week7_pomdp) Partially observable MDPs\n  * Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)\n  * Seminar: Deep kung-fu \u0026 doom with recurrent A3C and DRQN\n    \n* [__week8__](https://github.com/yandexdataschool/Practical_RL/tree/master/week8_scst) Applications II\n  * Lecture: Reinforcement Learning as a general way to optimize non-differentiable loss. G2P, machine translation, conversation models, image captioning, discrete GANs. Self-critical sequence training.\n  * Seminar: Simple neural machine translation with self-critical sequence training\n\n* [__week9__](https://github.com/yandexdataschool/Practical_RL/tree/master/week9_policy_II) Policy gradient methods II\n  * Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG. Bonus: DPG for discrete action spaces.\n  * Seminar: Approximate TRPO for simple robotic tasks.\n\n* [Some after-course bonus materials](https://github.com/yandexdataschool/Practical_RL/tree/master/yet_another_week)\n  \n\n# Course staff\nCourse materials and teaching by: _[unordered]_\n- [Pavel Shvechikov](https://github.com/bestxolodec) - lectures, seminars, hw checkups, reading group\n- [Oleg Vasilev](https://github.com/Omrigan) - seminars, hw checkups, technical support\n- [Alexander Fritsler](https://github.com/Fritz449) - lectures, seminars, hw checkups\n- [Nikita Putintsev](https://github.com/qwasser) - seminars, hw checkups, organizing our hot mess\n- [Fedor Ratnikov](https://github.com/justheuristic/) - lectures, seminars, hw checkups\n- [Alexey Umnov](https://github.com/alexeyum) - seminars, hw checkups\n\n# Contributions\n* Using pictures from [Berkeley AI course](http://ai.berkeley.edu/home.html)\n* Massively refering to [CS294](http://rll.berkeley.edu/deeprlcourse/)\n* Sevaral tensorflow assignments by [Scitator](https://github.com/Scitator)\n* A lot of fixes from [arogozhnikov](https://github.com/arogozhnikov)\n* Other awesome people: see github contributors\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsshkhr%2Fpractical_rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsshkhr%2Fpractical_rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsshkhr%2Fpractical_rl/lists"}