https://github.com/geyang/reinforcement_learning_learning_notes
https://github.com/geyang/reinforcement_learning_learning_notes
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/geyang/reinforcement_learning_learning_notes
- Owner: geyang
- Created: 2017-06-27T19:22:45.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-12-04T18:28:49.000Z (over 7 years ago)
- Last Synced: 2025-01-10T12:58:21.647Z (5 months ago)
- Language: Python
- Size: 45.2 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Reinforcement Learning Notes
My notes on reinforcement learning.
**Update**: I am implementing some new algorithms in private repos, so the list here is incomplete. I will come back to update this from time to time.
## Plans (2017-12-04)
- [ ] C51, distributional Q-learning
- [ ] Solve Montezuma with re-weighted sampling
- [ ] Move PPO into this repo### Done
- [x] DQN
- [x] prioritized replay
- [x] double Q-learning (or half Q-learning)
- [x] dueling networks
- [x] $\epsilon$-greedy with linear scheduling
- [x] Gradients, and REINFORCE algorithm
- [x] policy gradients
- [x] Setups
- [x] Get MuJoCo
- [x] setup OpenAI Gym on AWS (yay!:confetti_ball:)
- [x] install `MuJoCo` :confetti_ball:
- [x] install `mujoco-py` (need to upgrade to 1.50 now supports python 3.6)
- [x] make a list of concepts to keep track of### Backlog
- [ ] TRPO
- [ ] A3C
- [ ] Behavior Cloning
- [ ] DAgger### On How to Ask for Help
I found textbook to be the most reliable source but it's easy to get lost in the chapters. So the best way to ask for guidance seem to be:
> I'm reading Chapter xx and topic xx atm, what are the key things I should pay attention to?### Reference Readings
- [ ] David Silver's RL course [index](david%20silver%20RL%20course/course%20index.md)
- [ ] Berkeley RL course [http://rll.berkeley.edu/deeprlcourse/](http://rll.berkeley.edu/deeprlcourse/)
- [x] [http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/](http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/)
- [ ] [https://arxiv.org/pdf/1506.05254.pdf](https://arxiv.org/pdf/1506.05254.pdf) is a longer explanation of different viewpoints for taking derivatives.
- [x] Contextual bandits:
- http://hunch.net/?p=298
- https://getstream.io/blog/introduction-contextual-bandits/## Research Ideas
- Curiosity as reward
- Finding answers as reward
- inferring intention
- Learning to predict (lots of prior art. self-supervision)
- Auxiliary supervision and Auxiliary modalities.
- inverse reinforcement learning != imitation learning