Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/animesh-chourey/deep-q-network-reinforcement-learning
https://github.com/animesh-chourey/deep-q-network-reinforcement-learning
atari-breakout deep-q-network deep-reinforcement-learning keras openai-gym reinforcement-learning
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/animesh-chourey/deep-q-network-reinforcement-learning
- Owner: Animesh-Chourey
- Created: 2022-08-29T11:51:24.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-08-29T13:11:24.000Z (over 2 years ago)
- Last Synced: 2024-08-05T12:56:55.465Z (5 months ago)
- Topics: atari-breakout, deep-q-network, deep-reinforcement-learning, keras, openai-gym, reinforcement-learning
- Language: Python
- Homepage:
- Size: 286 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Atari Breakout using Reinforcement Learning
A Deep Q-network is trained to play the game Atari Breakout. [OpenAI baselines](https://github.com/openai/baselines) is used. The environment for Atari Breakout is created using OpenAI Gym.
Here, some improvements have been made to the baseline implementation:
* Adam optimiser has been substituted with RMSprop.
* Learning rate is set to 0.0001 with discount factor (rho) set to 0.99.
* To manage the memory issue, size of replay buffer is reduced to 10000 from 100000.
* The batch is drawn and and every four frames the network is updated only after replay buffer is full (in baseline implementation batch is drawn and updated only after the replay buffer size is larger than the batch size).
* Instead of storing the return of the last 100 episodes, the return of every episode is stored.### Testing
The trained model with the conditions mentioned before is loaded for testing. The Atari Breakout environment is created by employing the same wrapper used for training the model. Ten episodes are recorded using the greedy policy based on the trained Deep Q-Network