https://github.com/Curt-Park/reinforcement_learning_an_introduction
Summary (in Korean) and python implementation of 'Reinforcement Learning: An Introduction' written by Sutton & Barto
https://github.com/Curt-Park/reinforcement_learning_an_introduction
Last synced: 6 months ago
JSON representation
Summary (in Korean) and python implementation of 'Reinforcement Learning: An Introduction' written by Sutton & Barto
- Host: GitHub
- URL: https://github.com/Curt-Park/reinforcement_learning_an_introduction
- Owner: Curt-Park
- Created: 2018-04-20T08:55:01.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-08-05T01:54:42.000Z (about 7 years ago)
- Last Synced: 2024-08-03T01:15:55.037Z (about 1 year ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 5.55 MB
- Stars: 56
- Watchers: 9
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# reinforcement_learning_an_introduction
Summary (**in Korean**) and python implementation of 'Reinforcement Learning: An Introduction' written by Sutton & Barto.### [1. Introduction](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true)
* [1.1 Reinforcement Learning](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.1-Reinforcement-Learning)
* [1.2 Examples](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.2-Examples)
* [1.3 Elements of Reinforcement Learning](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.3-Elements-of-Reinforcement-Learning)
* [1.4 Limitations and Scope](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.4-Limitations-and-Scope)
* [1.5 An Extended Example: Tic-Tac-Toe](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch01_introduction/introduction.ipynb?flush_cache=true#1.5-An-Extended-Example:-Tic-Tac-Toe)### [2. Multi-armed bandits](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true)
* [2.1 k-armed bandit problem](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.1-k-armed-bandit-problem)
* [2.2 Action-value Methods](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.2-Action-value-Methods)
* [2.3 The 10-armed Testbed](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.3-The-10-armed-Testbed)
* [2.4 Incremental Implementation](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.4-Incremental-Implementation)
* [2.5 Tracking a Nonstationary Problem](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.5-Tracking-a-Nonstationary-Problem)
* [2.6 Optimistic Initial Values](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.6-Optimistic-Initial-Values)
* [2.7 Upper-Confidence-Bound Action Selection](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.7-Upper-Confidence-Bound-Action-Selection)
* [2.8 Gradient Bandit Algorithms](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.8-Gradient-Bandit-Algorithms)
* [2.9 Associative Search (Contextual Bandits)](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch02_multi-armed_bandits/multi-armed_bandits.ipynb?flush_cache=true#2.9-Associative-Search-(Contextual-Bandits))### [3. Finite Markov Decision Processes](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE)
* [3.1 The Agent-Environment Interface](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.1-The-Agent-Environment-Interface)
* [3.2 Goals and Rewards](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.2-Goals-and-Rewards)
* [3.3 Returns and Episodes](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.3-Returns-and-Episodes)
* [3.4 Unified Notation for Episodic and Continuing Tasks](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.4-Unified-Notation-for-Episodic-and-Continuing-Tasks)
* [3.5 Policies and Value Functions](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.5-Policies-and-Value-Functions)
* [3.6 Optimal Policies and Optimal Value Functions](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.6-Optimal-Policies-and-Optimal-Value-Functions)
* [3.7 Optimality and Approximation](https://nbviewer.jupyter.org/github/Curt-Park/reinforcement_learning_an_introduction/blob/master/ch03_finite_markov_decision_processes/finite_markov_decision_processes.ipynb?flush_cache=TRUE#3.7-Optimality-and-Approximation)