https://github.com/pawel-kieliszczyk/snake-reinforcement-learning

AI (A2C agent) mastering the game of Snake with TensorFlow 2.0
https://github.com/pawel-kieliszczyk/snake-reinforcement-learning

a2c artificial-intelligence keras machine-learning reinforcement-learning snake tensorflow

Last synced: 3 months ago
JSON representation

AI (A2C agent) mastering the game of Snake with TensorFlow 2.0

Host: GitHub
URL: https://github.com/pawel-kieliszczyk/snake-reinforcement-learning
Owner: pawel-kieliszczyk
Created: 2019-06-08T08:06:16.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-07-21T16:57:24.000Z (almost 6 years ago)
Last Synced: 2025-03-24T04:13:05.473Z (3 months ago)
Topics: a2c, artificial-intelligence, keras, machine-learning, reinforcement-learning, snake, tensorflow
Language: Python
Homepage: https://youtu.be/s_vGKtm3bd4
Size: 48.1 MB
Stars: 41
Watchers: 1
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# SnakeAI

Training AI to play the game of Snake. With reinforcement learning (distributed A2C) it can learn to play a perfect game and score maximum points.

## Overview

AI learning to play Snake game "from pixels" with Tensorflow 2.0.

![](snake-animation.gif)

## Requirements

Python 2 and Tensorflow 2.0 Beta or later

## Usage

To train AI, simply type:

```
$ python src/train.py
```

The agent can be trained multiple times. It will keep improving. Its state is saved automatically.

If you want to watch your trained AI playing the game:

```
$ python src/play.py
```

The repository contains a pre-trained AI (trained on 1 GPU + 12 CPUs). To watch it playing, type:

```
$ python src/play_pretrained.py
```

## Implementation details

Implementation uses a distributed version of Advantage Actor-Critic method (A2C).
It consists of two types of processes:
+ **master process** (1 instance): It owns the neural network model. It broadcasts network's weights to all "worker" processes (see below) and waits for mini-batches of experiences. Then it combines all the mini-batches and performs a network update using SGD. Then it broadcasts the current neural network's weights to workers again.
+ **worker process** (as many as number of cores): Each worker has its own copy of an A2C agent. Neural networks weights are received from "master" process (see above). Sample Snake games are played, a mini-batch of experiences is collected and sent back to master. Each worker then waits for an updated set of network's weights.

Neural network architecture:
+ Shared layers by both actor and critic: 4x convlutional layer (filters: 3x3, channels: 64).
+ Actor's head (policy head): 1x convolutional layer (filters: 1x1, channels: 2), followed by a fully connected layer (4 units, one per move: up, down, left, right)
+ Critic's head (value head): 1x convolutional layer (filters: 1x1, channels: 1), followed by a fully connected layer (64 units), followed by a fully connected layer (1 unit - state's value)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pawel-kieliszczyk/snake-reinforcement-learning

Awesome Lists containing this project

README