An open API service indexing awesome lists of open source software.

https://github.com/gunh0/reinforcement-learning-cartpole-balancing

πŸ“’ 2019 Microsoft Student Partners (MSP) Evangelism Seminar - 2019.03.31
https://github.com/gunh0/reinforcement-learning-cartpole-balancing

artificial-intelligence cartpole microsoft-student-partners msp reinforcement-learning

Last synced: 8 months ago
JSON representation

πŸ“’ 2019 Microsoft Student Partners (MSP) Evangelism Seminar - 2019.03.31

Awesome Lists containing this project

README

          

### 2019 Microsoft Student Partners (MSP) Evangelism Seminar

**처음 μ‹œμž‘ν•˜λŠ” κ°•ν™”ν•™μŠ΅ with OpenAI Gym**

**2019. 03. 31**

![msp-logo.png](./docs/image/msp-logo.png)

---

**Cart Pole κ· ν˜• λ¬Έμ œλŠ” μœ μ „μž μ•Œκ³ λ¦¬μ¦˜, 인곡신경망, κ°•ν™”ν•™μŠ΅ 등을 μ΄μš©ν•œ μ œμ–΄ μ „λž΅ λΆ„μ•Όμ˜ ν‘œμ€€ λ¬Έμ œμ΄λ‹€.**

![cartpole-task.gif](/docs/image/cartpole-task.gif)

### Result (legacy)

![result-old.png](./docs/image/result-old.png)


### Last Updated (2024. 01.)

>

- python 3.11.9

This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium.

![output.png](./docs/image/output.png)

**Diagram**

![diagram.png](./docs/image/diagram.jpg)

Actions are chosen either randomly or based on a policy, getting the next step sample from the gym environment. We record the results in the replay memory and also run optimization step on every iteration. Optimization picks a random batch from the replay memory to do training of the new policy. The β€œolder” target_net is also used in optimization to compute the expected Q values. A soft update of its weights are performed at every step.