https://github.com/gunh0/reinforcement-learning-cartpole-balancing
π’ 2019 Microsoft Student Partners (MSP) Evangelism Seminar - 2019.03.31
https://github.com/gunh0/reinforcement-learning-cartpole-balancing
artificial-intelligence cartpole microsoft-student-partners msp reinforcement-learning
Last synced: 8 months ago
JSON representation
π’ 2019 Microsoft Student Partners (MSP) Evangelism Seminar - 2019.03.31
- Host: GitHub
- URL: https://github.com/gunh0/reinforcement-learning-cartpole-balancing
- Owner: gunh0
- License: mit
- Created: 2019-03-31T12:16:30.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-03T05:18:21.000Z (about 1 year ago)
- Last Synced: 2024-12-03T12:11:10.611Z (10 months ago)
- Topics: artificial-intelligence, cartpole, microsoft-student-partners, msp, reinforcement-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 5.73 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### 2019 Microsoft Student Partners (MSP) Evangelism Seminar
**μ²μ μμνλ κ°ννμ΅ with OpenAI Gym**
**2019. 03. 31**

---
**Cart Pole κ· ν λ¬Έμ λ μ μ μ μκ³ λ¦¬μ¦, μΈκ³΅μ κ²½λ§, κ°ννμ΅ λ±μ μ΄μ©ν μ μ΄ μ λ΅ λΆμΌμ νμ€ λ¬Έμ μ΄λ€.**

### Result (legacy)

### Last Updated (2024. 01.)
>
- python 3.11.9
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium.

**Diagram**

Actions are chosen either randomly or based on a policy, getting the next step sample from the gym environment. We record the results in the replay memory and also run optimization step on every iteration. Optimization picks a random batch from the replay memory to do training of the new policy. The βolderβ target_net is also used in optimization to compute the expected Q values. A soft update of its weights are performed at every step.