Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mauroluzzatto/openai-gym-tictactoe-environment
OpenAI Gym based Tic-Tac-Toe Environment
https://github.com/mauroluzzatto/openai-gym-tictactoe-environment
gym gym-environment openai-gym python reinforcement-learning tic-tac-toe tictactoe-environment
Last synced: about 2 months ago
JSON representation
OpenAI Gym based Tic-Tac-Toe Environment
- Host: GitHub
- URL: https://github.com/mauroluzzatto/openai-gym-tictactoe-environment
- Owner: MauroLuzzatto
- Created: 2019-09-22T09:50:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-02-03T12:06:59.000Z (11 months ago)
- Last Synced: 2024-06-14T02:26:49.727Z (7 months ago)
- Topics: gym, gym-environment, openai-gym, python, reinforcement-learning, tic-tac-toe, tictactoe-environment
- Language: Jupyter Notebook
- Homepage:
- Size: 389 KB
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![Python](https://img.shields.io/badge/python-v3.6+-blue.svg)
# OpenAI Gym – TicTacToe Environment
This repository contains a TicTacToe-Environment based on the OpenAI Gym module.
An example on how to use this environment with a Q-Learning algorithm that learns to play TicTacToe through self-play can be found [here](https://github.com/MauroLuzzatto/learn-tictactoe-through-self-play).
TicTacToe is a board game, where two players compete to place three stones of their color in parallel (horizontally or vertically) or diagonally to win the game.
| Environment Id | Observation Space |Action Space| Reward Range | tStepL |
| ------ | ------ |------ | ------ |------ |
| TTT-v0 |Box(3,3)|Discrete(9)|(-inf, inf) | 9 |## Getting Started
### 1) Setup
```
git clone [email protected]:MauroLuzzatto/OpenAI-Gym-TicTacToe-Environment.git
cd OpenAI-Gym-TicTacToe-Environment
pip install -r requirements.txt
```### 2) Register the Environment
run the following command in the command line
```
pip install -e gym-TicTacToe/.
```now you should see the following message
```
"Successfully installed gym-TicTacToe"
```
Further information on how to register an gym environment can be found [here](gym-TicTacToe/README.md)### 3) Use the environment and start playing
```
import gym_TicTacToe
import gym# initialize the tictactoe environment
env = gym.envs.make("TTT-v0", small=-1, large=10)# start playing
color = 1
action = 0
new_state, reward, done, _ = env.step((action, color))
```## TicTacToe Environment
### Methods
The environment contains the following four main methods:
- **reset**: reset the board game
- **step**: add a stone of a color of a player in the board
- **decode_action**: convert action from 0 to 9 into column and row values
- **render**: render the stones on the board game### Actions
The action space contains integers from 0 to 9, each representing a board field. The table below shows the action number and its corresponding board position.| | | |
| :---: | :---: | :---: |
| 0 | 1 | 2 |
| 3 | 4 | 5 |
| 6 | 7 | 8 |### States
State space:
- On a 3x3 board are theoretically 3^n^2 = 3^3^2 = 19’683 stone combinations of two different colors (and no color) possible (n = the size of the square filed)
- However, not all combinations are legal (e.g. you can have a board full of stones from one color)
- The state space can therefore be reduced to 8’953 statesState representation:
- The state is represented as a 3x3 numpy array representing the 9 fields of the board
- Dependent on the moves, the array will get the value
- 0 for no stone,
- 1 = stone of player 1
- 2 = stone of player 2### Rewards
There are three different types of rewards in this environment:
- Large reward (env.large = 10) when the player wins (= +10)
- Small negative reward (env.small = -1) for every move played (= -1)### Done
The game finishes:
- When one of the players has three stones either horizontally, vertically or diagonally
- When the board is full of stones, but there is no winner## References
- [OpenAI Gym](https://gym.openai.com/): Gym is a toolkit for developing and comparing reinforcement learning algorithms from OpenAI
- [Tic Tac Toe](https://en.wikipedia.org/wiki/Tic-tac-toe)