{"id":15043582,"url":"https://github.com/mimoralea/gdrl","last_synced_at":"2025-04-12T23:30:35.088Z","repository":{"id":42052933,"uuid":"125292414","full_name":"mimoralea/gdrl","owner":"mimoralea","description":"Grokking Deep Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2022-02-04T21:19:47.000Z","size":659633,"stargazers_count":889,"open_issues_count":13,"forks_count":248,"subscribers_count":29,"default_branch":"master","last_synced_at":"2025-04-04T03:04:35.977Z","etag":null,"topics":["algorithms","artificial-intelligence","deep-learning","deep-reinforcement-learning","docker","gpu","machine-learning","neural-networks","numpy","numpy-tutorial","nvidia-docker","pytorch","pytorch-tutorials","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://www.manning.com/books/grokking-deep-reinforcement-learning","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mimoralea.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-15T00:51:51.000Z","updated_at":"2025-04-02T11:46:28.000Z","dependencies_parsed_at":"2022-08-04T19:01:31.086Z","dependency_job_id":null,"html_url":"https://github.com/mimoralea/gdrl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mimoralea%2Fgdrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mimoralea%2Fgdrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mimoralea%2Fgdrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mimoralea%2Fgdrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mimoralea","download_url":"https://codeload.github.com/mimoralea/gdrl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248647249,"owners_count":21139081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","artificial-intelligence","deep-learning","deep-reinforcement-learning","docker","gpu","machine-learning","neural-networks","numpy","numpy-tutorial","nvidia-docker","pytorch","pytorch-tutorials","reinforcement-learning"],"created_at":"2024-09-24T20:49:18.223Z","updated_at":"2025-04-12T23:30:35.054Z","avatar_url":"https://github.com/mimoralea.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grokking Deep Reinforcement Learning\n\n**Note:** At the moment, only running the code from the [docker](https://github.com/docker/docker-ce) container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment. \n\nTo install docker, I recommend a web search for \"installing docker on \\\u003cyour os here\u003e\". For running the code on a GPU, you have to additionally install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below. \n\n## Running the code\n  0. Clone this repo:  \n  `git clone --depth 1 https://github.com/mimoralea/gdrl.git \u0026\u0026 cd gdrl`\n  1. Pull the gdrl image with:  \n  `docker pull mimoralea/gdrl:v0.14`\n  2. Spin up a container:\n     - On Mac or Linux:  \n     `docker run -it --rm -p 8888:8888 -v \"$PWD\"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14` \n     - On Windows:  \n     `docker run -it --rm -p 8888:8888 -v %CD%/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14`\n     - NOTE: Use `nvidia-docker` or add `--gpus all` after `--rm` to the command, if you are using a GPU.\n  3. Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is: `gdrl`\n\n## About the book\n\n### Book's website\n\nhttps://www.manning.com/books/grokking-deep-reinforcement-learning\n\n### Table of content\n\n  1. [Introduction to deep reinforcement learning](#1-introduction-to-deep-reinforcement-learning)\n  2. [Mathematical foundations of reinforcement learning](#2-mathematical-foundations-of-reinforcement-learning)\n  3. [Balancing immediate and long-term goals](#3-balancing-immediate-and-long-term-goals)\n  4. [Balancing the gathering and utilization of information](#4-balancing-the-gathering-and-utilization-of-information)\n  5. [Evaluating agents' behaviors](#5-evaluating-agents-behaviors)\n  6. [Improving agents' behaviors](#6-improving-agents-behaviors)\n  7. [Achieving goals more effectively and efficiently](#7-achieving-goals-more-effectively-and-efficiently)\n  8. [Introduction to value-based deep reinforcement learning](#8-introduction-to-value-based-deep-reinforcement-learning)\n  9. [More stable value-based methods](#9-more-stable-value-based-methods)\n  10. [Sample-efficient value-based methods](#10-sample-efficient-value-based-methods)\n  11. [Policy-gradient and actor-critic methods](#11-policy-gradient-and-actor-critic-methods)\n  12. [Advanced actor-critic methods](#12-advanced-actor-critic-methods)\n  13. [Towards artificial general intelligence](#13-towards-artificial-general-intelligence)\n\n### Detailed table of content\n\n#### 1. Introduction to deep reinforcement learning\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-1)\\)\n- \\(No Notebook\\)\n      \n#### 2. Mathematical foundations of reinforcement learning\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-2)\\)\n- \\([Notebook](/notebooks/chapter_02/chapter-02.ipynb)\\)\n  - Implementations of several MDPs: \n    - Bandit Walk\n    - Bandit Slippery Walk\n    - Slippery Walk Three\n    - Random Walk\n    - Russell and Norvig's Gridworld from AIMA\n    - FrozenLake\n    - FrozenLake8x8\n#### 3. Balancing immediate and long-term goals\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-3)\\)\n- \\([Notebook](/notebooks/chapter_03/chapter-03.ipynb)\\) \n  - Implementations of methods for finding optimal policies:\n    - Policy Evaluation\n    - Policy Improvement\n    - Policy Iteration\n    - Value Iteration\n#### 4. Balancing the gathering and utilization of information\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-4)\\)\n- \\([Notebook](/notebooks/chapter_04/chapter-04.ipynb)\\)\n  - Implementations of exploration strategies for bandit problems:\n    - Random\n    - Greedy\n    - E-greedy\n    - E-greedy with linearly decaying epsilon\n    - E-greedy with exponentially decaying epsilon\n    - Optimistic initialization\n    - SoftMax\n    - Upper Confidence Bound\n    - Bayesian\n#### 5. Evaluating agents' behaviors\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-5)\\)\n- \\([Notebook](/notebooks/chapter_05/chapter-05.ipynb)\\)\n  - Implementation of algorithms that solve the prediction problem (policy estimation):\n    - On-policy first-visit Monte-Carlo prediction\n    - On-policy every-visit Monte-Carlo prediction\n    - Temporal-Difference prediction (TD)\n    - n-step Temporal-Difference prediction (n-step TD)\n    - TD(λ)\n#### 6. Improving agents' behaviors\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-6)\\)\n- \\([Notebook](/notebooks/chapter_06/chapter-06.ipynb)\\)\n  - Implementation of algorithms that solve the control problem (policy improvement):\n    - On-policy first-visit Monte-Carlo control\n    - On-policy every-visit Monte-Carlo control\n    - On-policy TD control: SARSA\n    - Off-policy TD control: Q-Learning\n    - Double Q-Learning\n#### 7. Achieving goals more effectively and efficiently\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-7)\\)\n- \\([Notebook](/notebooks/chapter_07/chapter-07.ipynb)\\)\n  - Implementation of more effective and efficient reinforcement learning algorithms:\n    - SARSA(λ) with replacing traces\n    - SARSA(λ) with accumulating traces\n    - Q(λ) with replacing traces\n    - Q(λ) with accumulating traces\n    - Dyna-Q\n    - Trajectory Sampling\n#### 8. Introduction to value-based deep reinforcement learning\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-8)\\)\n- \\([Notebook](/notebooks/chapter_08/chapter-08.ipynb)\\)\n  - Implementation of a value-based deep reinforcement learning baseline:\n    - Neural Fitted Q-iteration (NFQ)\n#### 9. More stable value-based methods\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-9)\\)\n- \\([Notebook](/notebooks/chapter_09/chapter-09.ipynb)\\)\n  - Implementation of \"classic\" value-based deep reinforcement learning methods:\n    - Deep Q-Networks (DQN)\n    - Double Deep Q-Networks (DDQN)\n#### 10. Sample-efficient value-based methods\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-10)\\)\n- \\([Notebook](/notebooks/chapter_10/chapter-10.ipynb)\\)\n  - Implementation of main improvements for value-based deep reinforcement learning methods:\n    - Dueling Deep Q-Networks (Dueling DQN)\n    - Prioritized Experience Replay (PER)\n#### 11. Policy-gradient and actor-critic methods\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-11)\\)\n- \\([Notebook](/notebooks/chapter_11/chapter-11.ipynb)\\)\n  - Implementation of classic policy-based and actor-critic deep reinforcement learning methods:\n    - Policy Gradients without value function and Monte-Carlo returns (REINFORCE)\n    - Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)  \n    - Asynchronous Advantage Actor-Critic (A3C)\n    - Generalized Advantage Estimation (GAE)\n    - \\[Synchronous\\] Advantage Actor-Critic (A2C)\n#### 12. Advanced actor-critic methods\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-12)\\)\n- \\([Notebook](/notebooks/chapter_12/chapter-12.ipynb)\\)\n  - Implementation of advanced actor-critic methods:\n    - Deep Deterministic Policy Gradient (DDPG)\n    - Twin Delayed Deep Deterministic Policy Gradient (TD3)\n    - Soft Actor-Critic (SAC)\n    - Proximal Policy Optimization (PPO)\n#### 13. Towards artificial general intelligence\n- \\([Livebook](https://livebook.manning.com/book/grokking-deep-reinforcement-learning/chapter-13)\\)\n- \\(No Notebook\\)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmimoralea%2Fgdrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmimoralea%2Fgdrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmimoralea%2Fgdrl/lists"}