https://github.com/sudharsan13296/hands-on-reinforcement-learning-with-python

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
https://github.com/sudharsan13296/hands-on-reinforcement-learning-with-python

asynchronous-advantage-actor-critic deep-deterministic-policy-gradient deep-learning-algorithms deep-q-network deep-recurrent-q-network deep-reinforcement-learning double-dqn drqn dueling-dqn hindsight-experience-replay markov-decision-processes monte-carlo openai-gym policy-gradient policy-gradients ppo q-learning reinforcement-learning sarsa trpo

Last synced: over 1 year ago
JSON representation

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

Host: GitHub
URL: https://github.com/sudharsan13296/hands-on-reinforcement-learning-with-python
Owner: sudharsan13296
Created: 2018-06-11T15:53:06.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2020-10-07T12:12:17.000Z (almost 6 years ago)
Last Synced: 2025-03-28T06:08:33.755Z (over 1 year ago)
Topics: asynchronous-advantage-actor-critic, deep-deterministic-policy-gradient, deep-learning-algorithms, deep-q-network, deep-recurrent-q-network, deep-reinforcement-learning, double-dqn, drqn, dueling-dqn, hindsight-experience-replay, markov-decision-processes, monte-carlo, openai-gym, policy-gradient, policy-gradients, ppo, q-learning, reinforcement-learning, sarsa, trpo
Language: Jupyter Notebook
Homepage: https://www.amazon.com/dp/B079Q3WLM4/ref=sr_1_1?ie=UTF8&qid=1518175121&sr=8-1&keywords=hands+on+reinforcement+learning+with+python
Size: 41.9 MB
Stars: 846
Watchers: 43
Forks: 325
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Check out the completely revised and updated second editon of this book which covers basic to advanced deep RL algorithms with extensive math. Check out the new repo [here.](https://github.com/sudharsan13296/Deep-Reinforcement-Learning-With-Python)

# [Hands-On Reinforcement Learning With Python](https://www.amazon.com/Hands-Reinforcement-Learning-Python-reinforcement-ebook/dp/B079Q3WLM4)

### Master reinforcement and deep reinforcement learning using OpenAI Gym and TensorFlow

## About the book

Reinforcement Learning with Python will help you to master basic reinforcement learning algorithms to the advanced deep reinforcement learning algorithms.

The book starts with an introduction to Reinforcement Learning followed by OpenAI and Tensorflow. You will then explore various RL algorithms and concepts such as the Markov Decision Processes, Monte-Carlo methods, and dynamic programming, including value and policy iteration. This example-rich guide will introduce you to deep learning, covering various deep learning algorithms. You will then explore deep reinforcement learning in depth, which is a combination of deep learning and reinforcement learning. You will master various deep reinforcement learning algorithms such as DQN, Double DQN. Dueling DQN, DRQN, A3C, DDPG, TRPO, and PPO. You will also learn about recent advancements in reinforcement learning such as imagination augmented agents, learn from human preference, DQfD, HER and many more.

## Get the book

## Get the Chinese Version (中文版)

The book is also translated into chinese and you can get it from here (这本书也被翻译成中文，你可以从这里得到它)：https://item.jd.com/12506442.html

## Table of Contents

### [1. Introduction to Reinforcement Learning](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/01.%20Introduction%20to%20Reinforcement%20Learning)

* [1.1. What is Reinforcement Learning?](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/01.%20Introduction%20to%20Reinforcement%20Learning/1.1%20What%20is%20Reinforcement%20Learning.ipynb)
* 1.2. Reinforcement Learning Cycle
* 1.3. How RL differs from other ML Paradigms?
* 1.4. Elements of Reinforcement Learning
* 1.5. Agent Environment Interface
* 1.6. Types of RL Environments
* 1.7. Reinforcement Learning Platforms
* 1.8. Applications of Reinforcement Learning

### [2. Getting Started with OpenAI and Tensorflow](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/02.%20Getting%20Started%20with%20OpenAI%20and%20Tensorflow)

* 2.1. Setting Up Your Machine
* 2.2. Installing Anaconda
* 2.3. Installing Docker
* 2.4. Installing OpenAI Gym and Universe
* 2.5. Common Error Fixes
* 2.6. OpenAI Gym
* [2.7. Basic Simulations](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/02.%20Getting%20Started%20with%20OpenAI%20and%20Tensorflow/2.07%20Basic%20Simulations.ipynb)
* [2.8. Training a Robot to walk ](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/02.%20Getting%20Started%20with%20OpenAI%20and%20Tensorflow/2.08%20Training%20an%20Robot%20to%20Walk.ipynb)
* [2.9. Building a Video Game Bot](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/02.%20Getting%20Started%20with%20OpenAI%20and%20Tensorflow/2.09%20Building%20a%20Video%20Game%20Bot%20.ipynb)
* [2.10. Tensorflow Fundamentals](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/02.%20Getting%20Started%20with%20OpenAI%20and%20Tensorflow/2.10%20TensorFlow%20Fundamentals.ipynb)
* [2.11. Tensorboard](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/02.%20Getting%20Started%20with%20OpenAI%20and%20Tensorflow/2.11%20TensorBoard.ipynb)

### [3. Markov Decision Process and Dynamic Programming](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/03.%20Markov%20Decision%20Process%20and%20Dynamic%20Programming)

* 3.1. Markov Chain and Markov Process
* 3.2. Markov Decision Process
* 3.3. Rewards and Returns
* 3.4. Episodic and Continous Tasks
* 3.5. Policy Function
* 3.6. State Value Function
* 3.7. State-Action Value Function (Q Function)
* 3.8. Bellman Equation and Optimality
* 3.9. Deriving Bellman Equation for Value and Q functions
* 3.10. Solving the Bellman Equation
* 3.11. Dynamic Programming
* [3.12. Solving Frozen Lake Problem using Value Iteration](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/03.%20Markov%20Decision%20Process%20and%20Dynamic%20Programming/3.12%20Value%20Iteration%20-%20Frozen%20Lake%20Problem.ipynb)
* [3.13. Solving Frozen Lake Problem using Policy Iteration](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/03.%20Markov%20Decision%20Process%20and%20Dynamic%20Programming/3.13%20Policy%20Iteration%20-%20Frozen%20Lake%20Problem.ipynb)

### [4. Gaming with Monte Carlo Methods](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/04.%20Gaming%20with%20Monte%20Carlo%20Methods)

* 4.1. Monte Carlo Methods
* [4.2. Estimating Value of Pi Using Monte Carlo](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/04.%20Gaming%20with%20Monte%20Carlo%20Methods/4.2%20Estimating%20Value%20of%20Pi%20using%20Monte%20Carlo.ipynb)
* 4.3. Monte Carlo Prediction
* 4.4. First visit Monte Carlo
* 4.5. Every visit Monte Carlo
* [4.6. BlackJack with Monte Carlo](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/04.%20Gaming%20with%20Monte%20Carlo%20Methods/4.6%20BlackJack%20with%20First%20visit%20MC.ipynb)
* 4.7. Monte Carlo Control
* 4.8. Monte Carlo Exploration Starts
* 4.9. On Policy Monte Carlo Control
* 4.10. Off Policy Monte Carlo Control

### [5. Temporal Difference Learning](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/05.%20Temporal%20Difference%20Learning)

* 5.1. Temporal Difference Learning
* 5.2. TD Prediction
* 5.3. TD Control
* 5.4. Q Learning
* [5.5. Solving the Taxi Problem using Q learning](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/5.%20Temporal%20Difference%20Learning/05.5%20Taxi%20Problem%20-%20Q%20Learning.ipynb)
* 5.6. SARSA
* [5.7. Solving the Taxi Problem using SARSA](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/5.%20Temporal%20Difference%20Learning/05.7%20Taxi%20Problem%20-%20SARSA.ipynb)
* 5.8. Difference Between Q learning and SARSA

### [6. Multi-Armed Bandit Problem](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/06.%20Multi-Armed%20Bandit%20Problem)

* [6.1. Multi-armed Bandit Problem](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/06.%20Multi-Armed%20Bandit%20Problem/6.1%20MAB%20-%20Various%20Exploration%20Strategies.ipynb)
* 6.2. Epsilon-Greedy Algorithm
* 6.3. Softmax Exploration Algorithm
* 6.4. Upper Confidence Bound Algorithm
* 6.5. Thompson Sampling Algorithm
* 6.6. Applications of MAB
* [6.7. Identifying Right Advertisement Banner Using MAB](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/06.%20Multi-Armed%20Bandit%20Problem/6.7%20Identifying%20Right%20AD%20Banner%20Using%20MAB.ipynb)
* 6.8. Contextual Bandits

### [7. Deep Learning Fundamentals](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/07.%20Deep%20Learning%20Fundamentals)

* 7.1. Artificial Neurons
* 7.2. Artificial Neural Network
* 7.3. Activation Functions
* 7.4. Deep Dive into ANN
* 7.5. Gradient Descent
* [7.6. Neural Networks in Tensorflow](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/07.%20Deep%20Learning%20Fundamentals/7.6%20Neural%20Network%20Using%20Tensorflow.ipynb)
* 7.7. Recurrent Neural Network
* 7.8. Backpropagation Through Time
* 7.9. Long Short Term Memory RNN
* [7.10. Generating Song Lyrics using LSTM RNN](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/07.%20Deep%20Learning%20Fundamentals/7.10%20Generating%20Song%20Lyrics%20Using%20LSTM%20RNN.ipynb)
* 7.11. Convolutional Neural Networks
* 7.12. CNN Architecture
* [7.13. Classifying Fashion Products Using CNN](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/07.%20Deep%20Learning%20Fundamentals/7.13%20Classifying%20Fashion%20Products%20Using%20CNN.ipynb)

### [8. Atari Games With Deep Q Network](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/08.%20Atari%20Games%20with%20DQN)

* 8.1. What is Deep Q network
* 8.2. Architecture of DQN
* 8.3. Convolutional Network
* 8.4. Experience Replay
* 8.5. Target Network
* 8.6. Clipping Rewards
* 8.7. DQN Algorithm
* [8.8. Building an Agent to Play Atari Games](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/08.%20Atari%20Games%20with%20DQN/8.8%20Building%20an%20Agent%20to%20Play%20Atari%20Games.ipynb)
* 8.9. Double DQN
* 8.10. Dueling Architecture

### [9. Playing Doom With Deep Recurrent Q Network ](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/09.%20Playing%20Doom%20Game%20using%20DRQN)

* 9.1. Deep Recurrent Q Network
* 9.2. Partially Observable MDP
* 9.3. Architecture of DRQN
* [9.4. Basic Doom Game](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/09.%20Playing%20Doom%20Game%20using%20DRQN/9.4%20Basic%20Doom%20Game.ipynb)
* [9.5. Build an Agent to Play Doom Game using DRQN](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/09.%20Playing%20Doom%20Game%20using%20DRQN/9.5%20Doom%20Game%20Using%20DRQN.ipynb)
* 9.6. Deep Attention Recurrent Q Network

### [10. Asynchronous Advantage Actor Critic Network ](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/10.%20Aysnchronous%20Advantage%20Actor%20Critic%20Network)

* 10.1. Asynchronous Actor Critic Algorithm
* 10.2. The three A's
* 10.3. Architecture of A3C
* 10.4. Working of A3C
* [10.5. Drive up the Mountain with A3C](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/10.%20Aysnchronous%20Advantage%20Actor%20Critic%20Network/10.5%20Drive%20up%20the%20Mountain%20Using%20A3C.ipynb)
* 10.6. Visualization in Tensorboard

### [11. Policy Gradients and Optimization](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/11.%20Policy%20Gradients%20and%20Optimization)

* 11.1. Policy Gradient
* [11.2. Lunar Lander Using Policy Gradient](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/11.%20Policy%20Gradients%20and%20Optimization/11.2%20Lunar%20Lander%20Using%20Policy%20Gradients.ipynb)
* 11.3. Deep Deterministic Policy Gradient
* [11.4. Swinging up the Pendulum using DDPG](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/11.%20Policy%20Gradients%20and%20Optimization/11.3%20Swinging%20Up%20the%20Pendulum%20Using%20DDPG.ipynb)
* 11.5. Trust Region Policy Optimizatio
* 11.6. Proximal Policy Optimization

### [12. Capstone Project: Car Racing using DQN](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/12.%20Capstone%20Project:%20Car%20Racing%20using%20DQN)

* [12.1. Environment Wrapper Functions](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/12.%20Capstone%20Project:%20Car%20Racing%20using%20DQN/12.1%20Environment%20Wrapper%20Functions.ipynb)
* [12.2. Dueling Network](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/12.%20Capstone%20Project:%20Car%20Racing%20using%20DQN/12.2%20Dueling%20network.ipynb)
* [12.3. Replay Buffer](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/12.%20Capstone%20Project:%20Car%20Racing%20using%20DQN/12.3%20Replay%20Memory.ipynb)
* [12.4. Training the Network](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/12.%20Capstone%20Project:%20Car%20Racing%20using%20DQN/12.4%20Training%20the%20network.ipynb)
* [12.5. Car Racing](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/12.%20Capstone%20Project:%20Car%20Racing%20using%20DQN/12.5%20Car%20Racing.ipynb)

### [13. Recent Advancements and Next Steps](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/tree/master/13.%20Recent%20Advancements%20and%20Next%20Steps)

* 13.1. Imagination Augmented Agents
* 13.2. Learning From Human Preference
* [13.3. Deep Q Learning From Demonstrations](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/13.%20Recent%20Advancements%20and%20Next%20Steps/13.3%20Deep%20Q%20Learning%20From%20Demonstrations.ipynb)
* [13.4. Hindsight Experience Replay](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python/blob/master/13.%20Recent%20Advancements%20and%20Next%20Steps/13.4%20Hindsight%20Experience%20Replay.ipynb)
* 13.5. Hierarchical Reinforcement Learning
* 13.6. Inverse Reinforcement Learning

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sudharsan13296/hands-on-reinforcement-learning-with-python

Awesome Lists containing this project

README