Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zackakil/deep-tic-tac-toe
Used deep reinforcement learning to train a deep neural network to play tic-tac-toe and deployed using tensorflow.js.
https://github.com/zackakil/deep-tic-tac-toe
convolutional-neural-networks keras machine-learning neural-network reinforcement-learning tensorflow-js
Last synced: 3 days ago
JSON representation
Used deep reinforcement learning to train a deep neural network to play tic-tac-toe and deployed using tensorflow.js.
- Host: GitHub
- URL: https://github.com/zackakil/deep-tic-tac-toe
- Owner: ZackAkil
- Created: 2020-02-02T17:39:20.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-10-30T10:39:16.000Z (about 2 months ago)
- Last Synced: 2024-12-18T12:09:15.220Z (10 days ago)
- Topics: convolutional-neural-networks, keras, machine-learning, neural-network, reinforcement-learning, tensorflow-js
- Language: Jupyter Notebook
- Homepage: https://zackakil.github.io/deep-tic-tac-toe/
- Size: 877 KB
- Stars: 59
- Watchers: 8
- Forks: 15
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Deep Tic-Tac-Toe [Play game](https://zackakil.github.io/deep-tic-tac-toe)
This project uses deep reinforcement learning to train a neural network to play Tic-Tac-Toe. The trained model is deployed in a web browser using TensorFlow.js.
![screenshot](screen_shot.png)
## How it Works
The project consists of two main components:
1. **Model Training (Python):** A Jupyter Notebook (`deep_learning_tic_tac-toe_model_training.ipynb` and `[player_goes_first]_deep_learning_tic_tac_toe_model_training.ipynb`) handles training the neural network. It uses a convolutional neural network (CNN) built with Keras. The training process involves:
* **Game Environment:** A custom `XandOs` class simulates the Tic-Tac-Toe environment, allowing the agent to interact with it.
* **Reinforcement Learning:** The agent learns through experience by playing against a random agent. Rewards are assigned for wins, losses, ties, and invalid moves.
* **Experience Replay:** Game states, actions, and rewards are stored in a memory buffer (`memory`). The agent learns from a batch of randomly sampled experiences from this buffer, improving stability and convergence.
* **CNN Architecture:** The CNN takes the current game board (represented as a 3x3x2 tensor, where the two channels indicate player 1 and player 2's marks) as input and outputs a probability distribution over the 9 possible moves.
* **Training Loop:** The agent repeatedly plays games, stores experiences in memory, and updates the CNN's weights based on the rewards received.
2. **Web Deployment (TensorFlow.js):** The trained model is converted to a TensorFlow.js Layers format and loaded in a web browser using `index.html`. The webpage provides a user interface to play against the AI. The `predict` function takes the current game grid as input and uses the loaded model to select the AI's next move. A small delay is added before the AI's move to simulate "thinking" time.
## Dependencies
* **Python:** NumPy, Matplotlib, Keras, TensorFlow (or TensorFlow 1.x in Colab)
* **Web:** Vue.js, TensorFlow.js## Key Files
* **`deep_learning_tic_tac_toe_model_training.ipynb`:** Jupyter Notebook for training the AI model.
* **`[player_goes_first]_deep_learning_tic_tac_toe_model_training.ipynb`:** Jupyter Notebook for training the AI model where the player goes first
* **`index.html`:** HTML file for the web-based game.
* **`model/model.json`:** TensorFlow.js Layers model file.
* **`python model weights/winer_weights.keras`:** Keras model weights (for the version of the model that has been trained where the agent goes second)## Potential Improvements
* **Training against a stronger opponent:** The current random agent is a relatively weak opponent. Training against a minimax algorithm or another deep learning agent could potentially lead to a stronger AI.
* **Exploring different network architectures:** Experimenting with different CNN architectures or other types of neural networks (e.g., recurrent neural networks) might improve performance.
* **Hyperparameter tuning:** Fine-tuning the hyperparameters (e.g., learning rate, batch size, decay rate) used during training could lead to better results.
* **Adding difficulty levels:** Implement different difficulty levels by adjusting the epsilon-greedy exploration strategy or by using different trained models.