Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zackakil/deep-tic-tac-toe

Used deep reinforcement learning to train a deep neural network to play tic-tac-toe and deployed using tensorflow.js.
https://github.com/zackakil/deep-tic-tac-toe

convolutional-neural-networks keras machine-learning neural-network reinforcement-learning tensorflow-js

Last synced: 3 days ago
JSON representation

Used deep reinforcement learning to train a deep neural network to play tic-tac-toe and deployed using tensorflow.js.

Host: GitHub
URL: https://github.com/zackakil/deep-tic-tac-toe
Owner: ZackAkil
Created: 2020-02-02T17:39:20.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-10-30T10:39:16.000Z (about 2 months ago)
Last Synced: 2024-12-18T12:09:15.220Z (10 days ago)
Topics: convolutional-neural-networks, keras, machine-learning, neural-network, reinforcement-learning, tensorflow-js
Language: Jupyter Notebook
Homepage: https://zackakil.github.io/deep-tic-tac-toe/
Size: 877 KB
Stars: 59
Watchers: 8
Forks: 15
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Deep Tic-Tac-Toe [Play game](https://zackakil.github.io/deep-tic-tac-toe)

This project uses deep reinforcement learning to train a neural network to play Tic-Tac-Toe. The trained model is deployed in a web browser using TensorFlow.js.

![screenshot](screen_shot.png)

## How it Works

The project consists of two main components:

1. **Model Training (Python):** A Jupyter Notebook (`deep_learning_tic_tac-toe_model_training.ipynb` and `[player_goes_first]_deep_learning_tic_tac_toe_model_training.ipynb`) handles training the neural network. It uses a convolutional neural network (CNN) built with Keras. The training process involves:

* **Game Environment:** A custom `XandOs` class simulates the Tic-Tac-Toe environment, allowing the agent to interact with it.
* **Reinforcement Learning:** The agent learns through experience by playing against a random agent. Rewards are assigned for wins, losses, ties, and invalid moves.
* **Experience Replay:** Game states, actions, and rewards are stored in a memory buffer (`memory`). The agent learns from a batch of randomly sampled experiences from this buffer, improving stability and convergence.
* **CNN Architecture:** The CNN takes the current game board (represented as a 3x3x2 tensor, where the two channels indicate player 1 and player 2's marks) as input and outputs a probability distribution over the 9 possible moves.
* **Training Loop:** The agent repeatedly plays games, stores experiences in memory, and updates the CNN's weights based on the rewards received.

2. **Web Deployment (TensorFlow.js):** The trained model is converted to a TensorFlow.js Layers format and loaded in a web browser using `index.html`. The webpage provides a user interface to play against the AI. The `predict` function takes the current game grid as input and uses the loaded model to select the AI's next move. A small delay is added before the AI's move to simulate "thinking" time.

## Dependencies

* **Python:** NumPy, Matplotlib, Keras, TensorFlow (or TensorFlow 1.x in Colab)
* **Web:** Vue.js, TensorFlow.js

## Key Files

* **`deep_learning_tic_tac_toe_model_training.ipynb`:** Jupyter Notebook for training the AI model.
* **`[player_goes_first]_deep_learning_tic_tac_toe_model_training.ipynb`:** Jupyter Notebook for training the AI model where the player goes first
* **`index.html`:** HTML file for the web-based game.
* **`model/model.json`:** TensorFlow.js Layers model file.
* **`python model weights/winer_weights.keras`:** Keras model weights (for the version of the model that has been trained where the agent goes second)

## Potential Improvements

* **Training against a stronger opponent:** The current random agent is a relatively weak opponent. Training against a minimax algorithm or another deep learning agent could potentially lead to a stronger AI.
* **Exploring different network architectures:** Experimenting with different CNN architectures or other types of neural networks (e.g., recurrent neural networks) might improve performance.
* **Hyperparameter tuning:** Fine-tuning the hyperparameters (e.g., learning rate, batch size, decay rate) used during training could lead to better results.
* **Adding difficulty levels:** Implement different difficulty levels by adjusting the epsilon-greedy exploration strategy or by using different trained models.