https://github.com/zubair-irshad/udacity_deep_rl
My solutions (with explanations) to the Udacity Deep Reinforcement Learning Nano Degree Program assignments, mini-projects and projects
https://github.com/zubair-irshad/udacity_deep_rl
ddpg-algorithm deep-neural-networks deep-q-network deep-reinforcement-learning dqn ppo pytorch reinforcement-learning unity-ml-agents
Last synced: about 2 months ago
JSON representation
My solutions (with explanations) to the Udacity Deep Reinforcement Learning Nano Degree Program assignments, mini-projects and projects
- Host: GitHub
- URL: https://github.com/zubair-irshad/udacity_deep_rl
- Owner: zubair-irshad
- Created: 2020-05-08T00:08:55.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-06-02T17:46:18.000Z (over 5 years ago)
- Last Synced: 2025-03-06T09:47:40.524Z (9 months ago)
- Topics: ddpg-algorithm, deep-neural-networks, deep-q-network, deep-reinforcement-learning, dqn, ppo, pytorch, reinforcement-learning, unity-ml-agents
- Language: Jupyter Notebook
- Size: 9.77 MB
- Stars: 1
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
My solutions (with explanations) to the Udacity Deep Reinforcement Learning Nano Degree Program
Current Progress:
Introduction to Deep RL
-----------------------
**Monte Carlo Methods**
- Implementation of [Monteo Carlo Methods](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf#page=113) for the enviornment [BlackJack](https://github.com/openai/gym/blob/master/gym/envs/toy_text/blackjack.py)
**Temporal Difference Methods**
- Implementation of [Sarsa](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf#page=154), [Q-learning](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf#page=157) and [Expected-Sarsa](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf#page=157) for the enviornment [CliffWalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py)
Deep Value Iterations
-----------------------
**Deep Q Network - DQN**
- Implementation of [Deep Q Network](https://arxiv.org/abs/1312.5602) for the enviornment [Lunar Lander](https://gym.openai.com/envs/LunarLander-v2/)
Project1: Navigation
-----------------------
- Implemented [Deep Q Network](https://arxiv.org/abs/1312.5602) to navigate an agent inside an enviornment while avoiding obstacle(bad reward paths)
Policy Gradients
-----------------------
**REINFORCE**
- Implementation of [REINOFRCE Algorithm](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) to teach an agent to play [Pong from scratch](https://gym.openai.com/envs/Pong-v0/)
**Proximal Policy Optimizaiton**
- Implementation of [PPO Algorithm](https://openai.com/blog/openai-baselines-ppo/) to teach an agent to play [Pong from scratch](https://gym.openai.com/envs/Pong-v0/)
Project2: Continuous Control
-----------------------
- Implemented [Deep Deterministic Policy Gradients](https://spinningup.openai.com/en/latest/algorithms/ddpg.html) to teach an 2-DOF robotic manipulator to reach a goal location. Enviornment used for training and testing: [Reacher](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#reacher)