Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/levimcclenny/Reinforcement_Learning

Reinforcement Learning
https://github.com/levimcclenny/Reinforcement_Learning

barto barto-sutton jupyter-notebook markov-decision-processes multi-armed-bandit reinforcement-learning sutton

Last synced: 3 months ago
JSON representation

Reinforcement Learning

Awesome Lists containing this project

README

        

# *Reinforcement Learning: An Introduction*

One text that is widely regarded as the "industry standard" in Reinforcement Learning is Sutton and Barto's *Reinforcement Learning: An Introduction.* Here you will find the supporting source code for the jupyter notebooks found on [my website](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/), as well as in the links below. My hope is that the code and the insights offered in these notebooks will help a causal reader better understand the power of reinforcement learning.

### [Chapter 2 - Multi-Arm Bandits](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/Barto_Sutton_RL/Multi_Arm_Bandits/)
This section of the book is dedicated to framing the optimal control and award maximization principles via an illustration though the "multi-armed bandit." More is discussed in the accompanying [Multi-Arm Bandit Jupyter Notebook](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/Barto_Sutton_RL/Multi_Arm_Bandits/). The standalone python source code can be found [here](https://github.com/levimcclenny/Reinforcement_Learning).

### [Chapter 3 - Finite Markov Decision Processes](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/Barto_Sutton_RL/Finite_MDPs/)
Here we begin to formulate the reinforcement learning problem, starting with markov chains and moving into Markov Decision Processes. We evaluate random and optimal value functions using the Gridworld example outlined in the text. Some of the more important takeaways are outlined in the [Finite Markov Decision Processes Jupyter Notebook](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/Barto_Sutton_RL/Finite_MDPs/) file and the source code is available [here](https://github.com/levimcclenny/Reinforcement_Learning).

### [Chapter 4 - Dynamic Programming](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/Barto_Sutton_RL/Dynamic_Programming/)
This chapter offers some insights into some popular, albeit fairly basic, dynamic programming algorithms used to evaluate MDPs when all the state-action interactions are known. Hence, in this chapter, we know all the transition probabilities, the rewards, etc, and there isnt anything to really "figure out" as far as the agent-environment interface is concerned. However, dont misconstrue that description with a lack of application of these algorithms, as you can see in the [Dynamic Programming Jupyter Notebook](http://people.tamu.edu/~levimcclenny/project/reinforcement-learning/Barto_Sutton_RL/Dynamic_Programming/) and in the text for this chapter, we can apply these algorithms to real-life problems and handle some subtle non-linearities in problems that traditional optimization algorithms might struggle with. Thus, the efficacy of these algorithms, and hence their presentation in this chapter and widespread use, is something to be valued. As always, the source code can be found on [my github](https://github.com/levimcclenny/Reinforcement_Learning).

Obligatory disclaimer: This is not original research, but rather my insights into this incredible book.