Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/drewstone/openai-snake
Snake implementation and deep Q network for solving it.
https://github.com/drewstone/openai-snake
neural-network openai-snake reinforcement-learning
Last synced: 2 days ago
JSON representation
Snake implementation and deep Q network for solving it.
- Host: GitHub
- URL: https://github.com/drewstone/openai-snake
- Owner: drewstone
- Created: 2020-03-22T04:05:49.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-06-22T01:57:31.000Z (over 2 years ago)
- Last Synced: 2023-04-14T00:37:14.840Z (over 1 year ago)
- Topics: neural-network, openai-snake, reinforcement-learning
- Language: Python
- Homepage:
- Size: 43 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# openai-snake
[wip] Snake game implementation as a gym environment with a snake agent. The goal is to solve snake using a deep Q network.
# Structure
The repo is structured through a few different files.
- `env.py` is responsible for the game state. It processes new actions by the snake, updates the game board, and produces the game board when the learning system needs it.
- `snake.py` is the agent. It contains the deep Q network and is responsible for all actions within the environment.
- `train.py` is the training procedure.
- `run.py` is the task runner. It runs the training procedure and initializes the snake and environment.
- `animate.py` is an animator over a procedure similar to the task runner. It runs the training procedure currently and outputs an `anim.mp4` to visualilze the game dynamics.# Bugs
There are few bugs currently within the repo, all marked with TODOs:
- The prize sometimes gets placed on top of the snake. It should be sampled in all positions except the currently occupied ones.
- The learning procedure currently discards the life ending `(state, action, reward, new_state)` pair due to inaccuracies in modeling the final states of the game. This should be fixed so that the learning procedure processes final game states, since they are in fact the highest penalizing situations to manifest.