Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/drewstone/openai-snake

Snake implementation and deep Q network for solving it.
https://github.com/drewstone/openai-snake

neural-network openai-snake reinforcement-learning

Last synced: 2 days ago
JSON representation

Snake implementation and deep Q network for solving it.

Host: GitHub
URL: https://github.com/drewstone/openai-snake
Owner: drewstone
Created: 2020-03-22T04:05:49.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2022-06-22T01:57:31.000Z (over 2 years ago)
Last Synced: 2023-04-14T00:37:14.840Z (over 1 year ago)
Topics: neural-network, openai-snake, reinforcement-learning
Language: Python
Homepage:
Size: 43 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# openai-snake

[wip] Snake game implementation as a gym environment with a snake agent. The goal is to solve snake using a deep Q network.

# Structure
The repo is structured through a few different files.
- `env.py` is responsible for the game state. It processes new actions by the snake, updates the game board, and produces the game board when the learning system needs it.
- `snake.py` is the agent. It contains the deep Q network and is responsible for all actions within the environment.
- `train.py` is the training procedure.
- `run.py` is the task runner. It runs the training procedure and initializes the snake and environment.
- `animate.py` is an animator over a procedure similar to the task runner. It runs the training procedure currently and outputs an `anim.mp4` to visualilze the game dynamics.

# Bugs
There are few bugs currently within the repo, all marked with TODOs:
- The prize sometimes gets placed on top of the snake. It should be sampled in all positions except the currently occupied ones.
- The learning procedure currently discards the life ending `(state, action, reward, new_state)` pair due to inaccuracies in modeling the final states of the game. This should be fixed so that the learning procedure processes final game states, since they are in fact the highest penalizing situations to manifest.