An open API service indexing awesome lists of open source software.

https://github.com/Curt-Park/reinforcement_learning_an_introduction

Summary (in Korean) and python implementation of 'Reinforcement Learning: An Introduction' written by Sutton & Barto
https://github.com/Curt-Park/reinforcement_learning_an_introduction

Last synced: 6 months ago
JSON representation

Summary (in Korean) and python implementation of 'Reinforcement Learning: An Introduction' written by Sutton & Barto

Awesome Lists containing this project

README

          

# reinforcement_learning_an_introduction
Summary (**in Korean**) and python implementation of 'Reinforcement Learning: An Introduction' written by Sutton & Barto.

### [1. Introduction](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true)

* [1.1 Reinforcement Learning](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.1-Reinforcement-Learning)
* [1.2 Examples](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.2-Examples)
* [1.3 Elements of Reinforcement Learning](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.3-Elements-of-Reinforcement-Learning)
* [1.4 Limitations and Scope](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.4-Limitations-and-Scope)
* [1.5 An Extended Example: Tic-Tac-Toe](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.5-An-Extended-Example:-Tic-Tac-Toe)

### [2. Multi-armed bandits](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true)

* [2.1 k-armed bandit problem](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.1-k-armed-bandit-problem)
* [2.2 Action-value Methods](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.2-Action-value-Methods)
* [2.3 The 10-armed Testbed](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.3-The-10-armed-Testbed)
* [2.4 Incremental Implementation](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.4-Incremental-Implementation)
* [2.5 Tracking a Nonstationary Problem](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.5-Tracking-a-Nonstationary-Problem)
* [2.6 Optimistic Initial Values](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.6-Optimistic-Initial-Values)
* [2.7 Upper-Confidence-Bound Action Selection](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.7-Upper-Confidence-Bound-Action-Selection)
* [2.8 Gradient Bandit Algorithms](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.8-Gradient-Bandit-Algorithms)
* [2.9 Associative Search (Contextual Bandits)](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.9-Associative-Search-(Contextual-Bandits))

### [3. Finite Markov Decision Processes](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE)

* [3.1 The Agent-Environment Interface](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.1-The-Agent-Environment-Interface)
* [3.2 Goals and Rewards](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.2-Goals-and-Rewards)
* [3.3 Returns and Episodes](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.3-Returns-and-Episodes)
* [3.4 Unified Notation for Episodic and Continuing Tasks](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.4-Unified-Notation-for-Episodic-and-Continuing-Tasks)
* [3.5 Policies and Value Functions](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.5-Policies-and-Value-Functions)
* [3.6 Optimal Policies and Optimal Value Functions](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.6-Optimal-Policies-and-Optimal-Value-Functions)
* [3.7 Optimality and Approximation](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.7-Optimality-and-Approximation)