https://github.com/scitator/rl-intro
https://github.com/scitator/rl-intro
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/scitator/rl-intro
- Owner: Scitator
- License: apache-2.0
- Created: 2019-03-02T07:14:28.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-03-11T16:13:59.000Z (about 6 years ago)
- Last Synced: 2025-05-01T18:07:40.514Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 17.2 MB
- Stars: 23
- Watchers: 2
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## RL Intro
2019 edition - Gym intro, Genetics, CEM, Tabular DQN
#### 0. Gym interface
- `00-gym.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/00-gym.ipynb)
#### 1. Genetic algorithm
- [slides](./2019/slides/01-genetics.pdf)
- `01-genetics.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/01-genetics.ipynb)
##### Additional materials
* __[recommended]__ - awesome openai post about evolution strategies - [blog post](https://blog.openai.com/evolution-strategies/), [article](https://arxiv.org/abs/1703.03864)
* Video on genetic algorithms - https://www.youtube.com/watch?v=ejxfTy4lI6I
* Another guide to genetic algorithm - https://www.youtube.com/watch?v=zwYV11a__HQ
* PDF on Differential evolution - http://jvanderw.une.edu.au/DE_1.pdf
* Video on Ant Colony Algorithm - https://www.youtube.com/watch?v=D58nLNLkb0I
* Longer video on Ant Colony Algorithm - https://www.youtube.com/watch?v=xpyKmjJuqhk
#### 2. Cross Entropy Method
- [slides](./2019/slides/02-cem.pdf)
- `02-cem.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/02-cem.ipynb)
##### Additional materials
* __[main]__ Video-intro by David Silver - https://www.youtube.com/watch?v=2pWv7GOvuf0
* Optional lecture by David Silver - https://www.youtube.com/watch?v=lfHX2hHRMVQ
* __[recommended]__ - formal explanation of crossentropy method in [general](https://people.smp.uq.edu.au/DirkKroese/ps/CEEncycl.pdf) and for [optimization](https://people.smp.uq.edu.au/DirkKroese/ps/CEopt.pdf)
#### 3. Tabular
- [slides](./2019/slides/03-tabular.pdf)
- `03-tabular.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/03-tabular.ipynb)
##### Additional materials
* __[main]__ lecture by David Silver - [url](https://www.youtube.com/watch?v=Nd1-UUMVfz4)
* Alternative lecture by Pieter Abbeel: [part 1](https://www.youtube.com/watch?v=i0o-ui1N35U), [part 2](https://www.youtube.com/watch?v=Csiiv6WGzKM)
* Alternative lecture by John Schulmann: https://www.youtube.com/watch?v=IL3gVyJMmhg
* Definitive guide in policy/value iteration from Sutton: start from page 81 [here](http://incompleteideas.net/sutton/book/bookdraft2017june19.pdf).
#### 4. DQN
- [slides](./2019/slides/04-dqn.pdf)
- `04-dqn.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/04-dqn.ipynb)
##### Additional materials
* Lecture by David Silver - [video part I](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [video part II](https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=43s)
* Alternative lecture by Pieter Abbeel - [video](https://www.youtube.com/watch?v=ifma8G7LegE)
* Alternative lecture by John Schulmann - [video](https://www.youtube.com/watch?v=IL3gVyJMmhg)
* Blog post on q-learning Vs SARSA - [url](https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/)
* N-step temporal difference from Sutton's book - [suttonbook](http://incompleteideas.net/book/RLbook2018.pdf) __chapter 7__
* Eligibility traces from Sutton's book - [suttonbook](http://incompleteideas.net/book/RLbook2018.pdf) __chapter 12__
* Blog post on eligibility traces - [url](http://pierrelucbacon.com/traces/)
2020 edition - Deep RL, DQN, DDPG
Credits
* [Berkeley CS188x](http://ai.berkeley.edu/home.html)
* [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)
* [dennybritz/reinforcement-learning](https://github.com/dennybritz/reinforcement-learning)
* [yandexdataschool/Practical_RL](https://github.com/yandexdataschool/Practical_RL)
* [yandexdataschool/AgentNet](https://github.com/yandexdataschool/AgentNet)
* [rl-course-experiments](https://github.com/Scitator/rl-course-experiments)
* [RL-Adventure](https://github.com/higgsfield/RL-Adventure)
* [RL-Adventure-2](https://github.com/higgsfield/RL-Adventure-2)