{"id":13934799,"url":"https://github.com/udacity/deep-reinforcement-learning","last_synced_at":"2025-07-19T19:32:11.285Z","repository":{"id":38007363,"uuid":"140018843","full_name":"udacity/deep-reinforcement-learning","owner":"udacity","description":"Repo for the Deep Reinforcement Learning Nanodegree program","archived":false,"fork":false,"pushed_at":"2023-11-16T01:05:06.000Z","size":3530,"stargazers_count":4827,"open_issues_count":5,"forks_count":2336,"subscribers_count":179,"default_branch":"master","last_synced_at":"2024-08-08T23:18:53.919Z","etag":null,"topics":["cross-entropy","ddpg","deep-reinforcement-learning","dqn","dynamic-programming","hill-climbing","ml-agents","neural-networks","openai-gym","openai-gym-solutions","ppo","pytorch","pytorch-rl","reinforcement-learning","reinforcement-learning-algorithms","rl-algorithms"],"latest_commit_sha":null,"homepage":"https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/udacity.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null}},"created_at":"2018-07-06T18:36:23.000Z","updated_at":"2024-08-05T15:42:37.000Z","dependencies_parsed_at":"2022-07-19T03:47:06.064Z","dependency_job_id":null,"html_url":"https://github.com/udacity/deep-reinforcement-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/udacity%2Fdeep-reinforcement-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/udacity%2Fdeep-reinforcement-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/udacity%2Fdeep-reinforcement-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/udacity%2Fdeep-reinforcement-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/udacity","download_url":"https://codeload.github.com/udacity/deep-reinforcement-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226666437,"owners_count":17665030,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-entropy","ddpg","deep-reinforcement-learning","dqn","dynamic-programming","hill-climbing","ml-agents","neural-networks","openai-gym","openai-gym-solutions","ppo","pytorch","pytorch-rl","reinforcement-learning","reinforcement-learning-algorithms","rl-algorithms"],"created_at":"2024-08-07T23:01:14.586Z","updated_at":"2024-11-27T02:30:52.663Z","avatar_url":"https://github.com/udacity.png","language":"Jupyter Notebook","funding_links":[],"categories":["强化学习实战资源","Jupyter Notebook"],"sub_categories":["Implementation of Algorithms"],"readme":"[//]: # (Image References)\n\n[image1]: https://user-images.githubusercontent.com/10624937/42135602-b0335606-7d12-11e8-8689-dd1cf9fa11a9.gif \"Trained Agents\"\n[image2]: https://user-images.githubusercontent.com/10624937/42386929-76f671f0-8106-11e8-9376-f17da2ae852e.png \"Kernel\"\n\n# Deep Reinforcement Learning Nanodegree\n\n![Trained Agents][image1]\n\nThis repository contains material related to Udacity's [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program.  \n\n## Table of Contents\n\n### Tutorials\n\nThe tutorials lead you through implementing various algorithms in reinforcement learning.  All of the code is in PyTorch (v0.4) and Python 3.\n\n* [Dynamic Programming](https://github.com/udacity/deep-reinforcement-learning/tree/master/dynamic-programming): Implement Dynamic Programming algorithms such as Policy Evaluation, Policy Improvement, Policy Iteration, and Value Iteration. \n* [Monte Carlo](https://github.com/udacity/deep-reinforcement-learning/tree/master/monte-carlo): Implement Monte Carlo methods for prediction and control. \n* [Temporal-Difference](https://github.com/udacity/deep-reinforcement-learning/tree/master/temporal-difference): Implement Temporal-Difference methods such as Sarsa, Q-Learning, and Expected Sarsa. \n* [Discretization](https://github.com/udacity/deep-reinforcement-learning/tree/master/discretization): Learn how to discretize continuous state spaces, and solve the Mountain Car environment.\n* [Tile Coding](https://github.com/udacity/deep-reinforcement-learning/tree/master/tile-coding): Implement a method for discretizing continuous state spaces that enables better generalization.\n* [Deep Q-Network](https://github.com/udacity/deep-reinforcement-learning/tree/master/dqn): Explore how to use a Deep Q-Network (DQN) to navigate a space vehicle without crashing.\n* [Robotics](https://github.com/dusty-nv/jetson-reinforcement): Use a C++ API to train reinforcement learning agents from virtual robotic simulation in 3D. (_External link_)\n* [Hill Climbing](https://github.com/udacity/deep-reinforcement-learning/tree/master/hill-climbing): Use hill climbing with adaptive noise scaling to balance a pole on a moving cart.\n* [Cross-Entropy Method](https://github.com/udacity/deep-reinforcement-learning/tree/master/cross-entropy): Use the cross-entropy method to train a car to navigate a steep hill.\n* [REINFORCE](https://github.com/udacity/deep-reinforcement-learning/tree/master/reinforce): Learn how to use Monte Carlo Policy Gradients to solve a classic control task.\n* **Proximal Policy Optimization**: Explore how to use Proximal Policy Optimization (PPO) to solve a classic reinforcement learning task. (_Coming soon!_)\n* **Deep Deterministic Policy Gradients**: Explore how to use Deep Deterministic Policy Gradients (DDPG) with OpenAI Gym environments.\n  * [Pendulum](https://github.com/udacity/deep-reinforcement-learning/tree/master/ddpg-pendulum): Use OpenAI Gym's Pendulum environment.\n  * [BipedalWalker](https://github.com/udacity/deep-reinforcement-learning/tree/master/ddpg-bipedal): Use OpenAI Gym's BipedalWalker environment.\n* [Finance](https://github.com/udacity/deep-reinforcement-learning/tree/master/finance): Train an agent to discover optimal trading strategies.\n\n### Labs / Projects\n\nThe labs and projects can be found below.  All of the projects use rich simulation environments from [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents). In the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program, you will receive a review of your project.  These reviews are meant to give you personalized feedback and to tell you what can be improved in your code.\n\n* [The Taxi Problem](https://github.com/udacity/deep-reinforcement-learning/tree/master/lab-taxi): In this lab, you will train a taxi to pick up and drop off passengers.\n* [Navigation](https://github.com/udacity/deep-reinforcement-learning/tree/master/p1_navigation): In the first project, you will train an agent to collect yellow bananas while avoiding blue bananas.\n* [Continuous Control](https://github.com/udacity/deep-reinforcement-learning/tree/master/p2_continuous-control): In the second project, you will train an robotic arm to reach target locations.\n* [Collaboration and Competition](https://github.com/udacity/deep-reinforcement-learning/tree/master/p3_collab-compet): In the third project, you will train a pair of agents to play tennis! \n\n### Resources\n\n* [Cheatsheet](https://github.com/udacity/deep-reinforcement-learning/blob/master/cheatsheet): You are encouraged to use [this PDF file](https://github.com/udacity/deep-reinforcement-learning/blob/master/cheatsheet/cheatsheet.pdf) to guide your study of reinforcement learning. \n\n## OpenAI Gym Benchmarks\n\n### Classic Control\n- `Acrobot-v1` with [Tile Coding](https://github.com/udacity/deep-reinforcement-learning/blob/master/tile-coding/Tile_Coding_Solution.ipynb) and Q-Learning  \n- `Cartpole-v0` with [Hill Climbing](https://github.com/udacity/deep-reinforcement-learning/blob/master/hill-climbing/Hill_Climbing.ipynb) | solved in 13 episodes\n- `Cartpole-v0` with [REINFORCE](https://github.com/udacity/deep-reinforcement-learning/blob/master/reinforce/REINFORCE.ipynb) | solved in 691 episodes \n- `MountainCarContinuous-v0` with [Cross-Entropy Method](https://github.com/udacity/deep-reinforcement-learning/blob/master/cross-entropy/CEM.ipynb) | solved in 47 iterations\n- `MountainCar-v0` with [Uniform-Grid Discretization](https://github.com/udacity/deep-reinforcement-learning/blob/master/discretization/Discretization_Solution.ipynb) and Q-Learning | solved in \u003c50000 episodes\n- `Pendulum-v0` with [Deep Deterministic Policy Gradients (DDPG)](https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-pendulum/DDPG.ipynb)\n\n### Box2d\n- `BipedalWalker-v2` with [Deep Deterministic Policy Gradients (DDPG)](https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-bipedal/DDPG.ipynb)\n- `CarRacing-v0` with **Deep Q-Networks (DQN)** | _Coming soon!_\n- `LunarLander-v2` with [Deep Q-Networks (DQN)](https://github.com/udacity/deep-reinforcement-learning/blob/master/dqn/solution/Deep_Q_Network_Solution.ipynb) | solved in 1504 episodes\n\n### Toy Text\n- `FrozenLake-v0` with [Dynamic Programming](https://github.com/udacity/deep-reinforcement-learning/blob/master/dynamic-programming/Dynamic_Programming_Solution.ipynb)\n- `Blackjack-v0` with [Monte Carlo Methods](https://github.com/udacity/deep-reinforcement-learning/blob/master/monte-carlo/Monte_Carlo_Solution.ipynb)\n- `CliffWalking-v0` with [Temporal-Difference Methods](https://github.com/udacity/deep-reinforcement-learning/blob/master/temporal-difference/Temporal_Difference_Solution.ipynb)\n\n## Dependencies\n\nTo set up your python environment to run the code in this repository, follow the instructions below.\n\n1. Create (and activate) a new environment with Python 3.6.\n\n\t- __Linux__ or __Mac__: \n\t```bash\n\tconda create --name drlnd python=3.6\n\tsource activate drlnd\n\t```\n\t- __Windows__: \n\t```bash\n\tconda create --name drlnd python=3.6 \n\tactivate drlnd\n\t```\n\t\n2. If running in **Windows**, ensure you have the \"Build Tools for Visual Studio 2019\" installed from this [site](https://visualstudio.microsoft.com/downloads/).  This [article](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30) may also be very helpful.  This was confirmed to work in Windows 10 Home.  \n\n3. Follow the instructions in [this repository](https://github.com/openai/gym) to perform a minimal install of OpenAI gym.  \n\t- Next, install the **classic control** environment group by following the instructions [here](https://github.com/openai/gym#classic-control).\n\t- Then, install the **box2d** environment group by following the instructions [here](https://github.com/openai/gym#box2d).\n\t\n4. Clone the repository (if you haven't already!), and navigate to the `python/` folder.  Then, install several dependencies.  \n    ```bash\n    git clone https://github.com/udacity/deep-reinforcement-learning.git\n    cd deep-reinforcement-learning/python\n    pip install .\n    ```\n\n5. Create an [IPython kernel](http://ipython.readthedocs.io/en/stable/install/kernel_install.html) for the `drlnd` environment.    \n    ```bash\n    python -m ipykernel install --user --name drlnd --display-name \"drlnd\"\n    ```\n\n6. Before running code in a notebook, change the kernel to match the `drlnd` environment by using the drop-down `Kernel` menu. \n\n![Kernel][image2]\n\n## Want to learn more?\n\n\u003cp align=\"center\"\u003eCome learn with us in the \u003ca href=\"https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893\"\u003eDeep Reinforcement Learning Nanodegree\u003c/a\u003e program at Udacity!\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003ca href=\"https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893\"\u003e\n \u003cimg width=\"503\" height=\"133\" src=\"https://user-images.githubusercontent.com/10624937/42135812-1829637e-7d16-11e8-9aa1-88056f23f51e.png\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fudacity%2Fdeep-reinforcement-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fudacity%2Fdeep-reinforcement-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fudacity%2Fdeep-reinforcement-learning/lists"}