Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/ShangtongZhang/reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction
https://github.com/ShangtongZhang/reinforcement-learning-an-introduction

artificial-intelligence reinforcement-learning

Last synced: about 1 month ago
JSON representation

Python Implementation of Reinforcement Learning: An Introduction

Host: GitHub
URL: https://github.com/ShangtongZhang/reinforcement-learning-an-introduction
Owner: ShangtongZhang
License: mit
Created: 2016-09-13T16:24:05.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2024-04-16T19:42:47.000Z (2 months ago)
Last Synced: 2024-04-16T23:54:56.354Z (2 months ago)
Topics: artificial-intelligence, reinforcement-learning
Language: Python
Homepage:
Size: 5.8 MB
Stars: 13,162
Watchers: 562
Forks: 4,754
Open Issues: 18
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

reinforcement-learning-resources - Reinforcement Learning: An Introduction (2nd Edition)
awesome-rl - Python Code
awesome-rl - Python Code
my-awesome-stars - ShangtongZhang/reinforcement-learning-an-introduction - Python Implementation of Reinforcement Learning: An Introduction (Python)
awesome-stars - ShangtongZhang/reinforcement-learning-an-introduction - Python Implementation of Reinforcement Learning: An Introduction (Python)
Awesome-AI-algorithm - Github
awesome-machine-learning-resources - **[Tutorial - learning-an-introduction?style=social) (Table of Contents)
awesome-rl - Python Code
awesome-stars - reinforcement-learning-an-introduction
awesome-stars - reinforcement-learning-an-introduction

README

        # Reinforcement Learning: An Introduction

```diff

@@ I am looking for self-motivated students interested in RL at different levels! @@

@@ Visit https://shangtongzhang.github.io/people/ for more details. @@

```

[![Build Status](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction.svg?branch=master)](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction)

Python replication for Sutton & Barto's book [*Reinforcement Learning: An Introduction (2nd Edition)*](http://incompleteideas.net/book/the-book-2nd.html)

> If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly, and unfortunately I do not have exercise answers for the book.

# Contents 

### Chapter 1

1. Tic-Tac-Toe

### Chapter 2

1. [Figure 2.1: An exemplary bandit problem from the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_1.png)

2. [Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_2.png)

3. [Figure 2.3: Optimistic initial action-value estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_3.png)

4. [Figure 2.4: Average performance of UCB action selection on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_4.png)

5. [Figure 2.5: Average performance of the gradient bandit algorithm](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_5.png)

6. [Figure 2.6: A parameter study of the various bandit algorithms](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_6.png)

### Chapter 3

1. [Figure 3.2: Grid example with random policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_2.png)

2. [Figure 3.5: Optimal solutions to the gridworld example](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_5.png)

### Chapter 4

1. [Figure 4.1: Convergence of iterative policy evaluation on a small gridworld](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_1.png)

2. [Figure 4.2: Jack’s car rental problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_2.png)

3. [Figure 4.3: The solution to the gambler’s problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_3.png)

### Chapter 5

1. [Figure 5.1: Approximate state-value functions for the blackjack policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_1.png)

2. [Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_2.png)

3. [Figure 5.3: Weighted importance sampling](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_3.png)

4. [Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_4.png)

### Chapter 6

1. [Example 6.2: Random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_6_2.png)

2. [Figure 6.2: Batch updating](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_2.png)

3. [Figure 6.3: Sarsa applied to windy grid world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_3.png)

4. [Figure 6.4: The cliff-walking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_4.png)

5. [Figure 6.6: Interim and asymptotic performance of TD control methods](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_6.png)

6. [Figure 6.7: Comparison of Q-learning and Double Q-learning](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_7.png)

### Chapter 7

1. [Figure 7.2: Performance of n-step TD methods on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_7_2.png)

### Chapter 8

1. [Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_2.png)

2. [Figure 8.4: Average performance of Dyna agents on a blocking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_4.png)

3. [Figure 8.5: Average performance of Dyna agents on a shortcut task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_5.png)

4. [Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_8_4.png)

5. [Figure 8.7: Comparison of efficiency of expected and sample updates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_7.png)

6. [Figure 8.8: Relative efficiency of different update distributions](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_8.png)

### Chapter 9

1. [Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_1.png)

2. [Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_2.png)

3. [Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_5.png)

4. [Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_8.png)

5. [Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_10.png)

### Chapter 10

1. [Figure 10.1: The cost-to-go function for Mountain Car task in one run](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_1.png)

2. [Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_2.png)

3. [Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_3.png)

4. [Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_4.png)

5. [Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_5.png)

### Chapter 11

1. [Figure 11.2: Baird's Counterexample](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_11_2.png)

2. [Figure 11.6: The behavior of the TDC algorithm on Baird’s counterexample](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_11_6.png)

3. [Figure 11.7: The behavior of the ETD algorithm in expectation on Baird’s counterexample](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_11_7.png)

### Chapter 12

1. [Figure 12.3: Off-line λ-return algorithm on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_3.png)

2. [Figure 12.6: TD(λ) algorithm on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_6.png)

3. [Figure 12.8: True online TD(λ) algorithm on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_8.png)

4. [Figure 12.10: Sarsa(λ) with replacing traces on Mountain Car](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_10.png)

5. [Figure 12.11: Summary comparison of Sarsa(λ) algorithms on Mountain Car](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_11.png)

### Chapter 13

1. [Example 13.1: Short corridor with switched actions](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_13_1.png)

2. [Figure 13.1: REINFORCE on the short-corridor grid world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_13_1.png)

3. [Figure 13.2: REINFORCE with baseline on the short-corridor grid-world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_13_2.png)

# Environment

* python 3.6 

* numpy

* matplotlib

* [seaborn](https://seaborn.pydata.org/index.html)

* [tqdm](https://pypi.org/project/tqdm/)

# Usage

> All files are self-contained

```commandline

python any_file_you_want.py

```

# Contribution

If you want to contribute some missing examples or fix some bugs, feel free to open an issue or make a pull request.