Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ttitcombe/dqn

PyTorch implementation of Deep Q Learning
https://github.com/ttitcombe/dqn

cartpole deep-learning deep-q-learning deep-q-network deep-reinforcement-learning dqn dqn-pytorch openai-gym pytorch pytorch-implementation reinforcement-learning

Last synced: 9 days ago
JSON representation

PyTorch implementation of Deep Q Learning

Host: GitHub
URL: https://github.com/ttitcombe/dqn
Owner: TTitcombe
License: mit
Created: 2019-07-17T07:25:23.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-01-06T22:06:34.000Z (almost 5 years ago)
Last Synced: 2024-11-07T06:32:53.337Z (about 2 months ago)
Topics: cartpole, deep-learning, deep-q-learning, deep-q-network, deep-reinforcement-learning, dqn, dqn-pytorch, openai-gym, pytorch, pytorch-implementation, reinforcement-learning
Language: Python
Homepage:
Size: 24.2 MB
Stars: 2
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # DQN

![DDQN CarRacing](results/videos/carracing_good.gif)

This is a basic implementation of Deep Q Learning. We have implemented linear and convolutional DQN and DDQN models, with 

DQN and double DQN algorithms

Next steps:

* Model trained on Atari games

* Prioritized Experience Replay

## To run

To begin, setup [OpenAI gym](https://gym.openai.com/) and install the packages in `requirements.txt`.

We have an example script which trains a model on the CartPoleSwingUp environment (this requires gym <= 0.9.4).

Run `python -m examples.cartpoleswingup_linear` in the top-level directory.

(To run Box2D environments, I used [this Docker container](https://github.com/TTitcombe/docker_openai_gym) - 

check it out if you are also having problems installing Gym)

## Results

The best models trained on each env are present in `results/models/`. There you will find the saved pytorch model as a `.pth` file and

a graph comparing the reward per episode against random play.

**Linear models**

| Model | Env             |      Score      |

|-------|-----------------|:---------------:|

| DQN | CartPole-v1     |  500            |

| DQN | CartPoleSwingUp\* |  872 +/- 3          |

You can see how dependant the linear model is on various hyperparameters in the following graph

![DQN linear investigation](results/CartpoleSwingUp_investigation.png)

\**Note: While we can train high-performing models on CartPoleSwingUp, these are very unstable, even when training for millions of frames and with a large (100000) capacity memory. 

It is not clear why this is the case.*

**Conv model**

| Model | Env | Score |

|-------|-----|:-----:|

![DQN_CartPoleSwingUp_Example](results/videos/double_dqn_cartpoleswingup.gif)