{"id":23050749,"url":"https://github.com/taka-rl/tic-tac-toe_q_learning","last_synced_at":"2026-05-09T00:32:18.033Z","repository":{"id":246596326,"uuid":"821568094","full_name":"taka-rl/tic-tac-toe_q_learning","owner":"taka-rl","description":"tic-tac-toe with q-learning","archived":false,"fork":false,"pushed_at":"2025-02-21T20:11:52.000Z","size":1371,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-21T21:24:20.329Z","etag":null,"topics":["matplotlib","multithreading","object-oriented-programming","oop","pandas","parallel-training","python","python3","q-learning","reinforcement-learning","rl-qlearning","tic-tac-toe","tic-tac-toe-game","tic-tac-toe-python","tictactoe"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taka-rl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-28T21:02:42.000Z","updated_at":"2025-02-21T20:11:56.000Z","dependencies_parsed_at":"2024-11-08T12:27:25.131Z","dependency_job_id":"871444f3-ccd1-49d0-838f-5af76a6d8dbc","html_url":"https://github.com/taka-rl/tic-tac-toe_q_learning","commit_stats":null,"previous_names":["taka-rl/tic-tac-toe_q_learning"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taka-rl%2Ftic-tac-toe_q_learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taka-rl%2Ftic-tac-toe_q_learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taka-rl%2Ftic-tac-toe_q_learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taka-rl%2Ftic-tac-toe_q_learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taka-rl","download_url":"https://codeload.github.com/taka-rl/tic-tac-toe_q_learning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246933355,"owners_count":20857052,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matplotlib","multithreading","object-oriented-programming","oop","pandas","parallel-training","python","python3","q-learning","reinforcement-learning","rl-qlearning","tic-tac-toe","tic-tac-toe-game","tic-tac-toe-python","tictactoe"],"created_at":"2024-12-15T23:37:00.000Z","updated_at":"2026-05-09T00:32:13.006Z","avatar_url":"https://github.com/taka-rl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tic-tac-toe with Q-learning\nThis is a tic-tac-toe game built using Q-learning, a reinforcement learning (RL) algorithm.\n\nAs a result of the training with 100,000 episodes where the agent played against a computer that made random moves, the agent won at around 75% of the games against the computer. The average reward was approximately 0.7, and the maximum reward exceeded 0.9.\n\n## About Q-learning\n    Q-learning algorithm:\n        Q-learning is a model-free, value-based, off-policy reinforcement learning (RL) algorithm\n        based on the Bellman equation. It uses a Q-table to store Q-values for state-action pairs,\n        which represent the expected future rewards for taking specific actions in specific states.\n\n    Steps in Q-Learning Algorithm:\n        1: Initialize the Q-values for all state-action pairs arbitrarily (often to zero).\n        2: Observe the current state\n        3: Select an action a based on the current policy (e.g., ε-greedy).\n        4: Perform the action a and observe the reward r and the next state\n        5: Update the Q-value using the Q-learning equation.\n        6: Set the current state s to the next state\n        7: Repeat steps 2-6 until the termination condition is met.\n\n    Q-table:\n        The Q-table is defined as a list data type and is stored as a csv file with the following format.\n        state, action, q_value\n            \"[[0, 0, 0], [0, 0, 0], [0, 0, 0]]\", 1, 0.0\n\n    Q-value Update Equation:\n        The Q-values are updated using the standard Q-learning update equation:\n            Q_new(s, a) = Q(s, a) + alpha * (r + γ * max(Q(s', a')) - Q(s, a))\n        Where:\n            - Q(s, a): Current Q-value for the state-action pair (s, a)\n            - alpha: Learning rate\n            - γ: Discount factor (gamma)\n            - max(Q(s', a')): Maximum Q-value for the next state s'\n            - s: Current state\n            - a: Action taken in the current state\n            - r: Reward received after taking action\n    Links:\n        ・Introduce Q-learning algorithm\n        https://towardsdatascience.com/an-ai-agent-learns-to-play-tic-tac-toe-part-3-training-a-q-learning-rl-agent-2871cef2faf0\n        https://medium.com/@ardra4/tic-tac-toe-using-q-learning-a-reinforcement-learning-approach-d606cfdd64a3\n        https://medium.com/@kaneel.senevirathne/teaching-agents-to-play-tic-tac-toe-using-reinforcement-learning-7a9d4d6ee9b3\n        https://www.datacamp.com/tutorial/introduction-q-learning-beginner-tutorial?dc_referrer=https%3A%2F%2Fwww.google.com%2F\n        https://towardsdatascience.com/reinforcement-learning-implement-tictactoe-189582bea542\n\n## RL Environment\nAction: Choose a move between 1 and 9.  \nState: Board configuration represented as a string, e.g., \"[[0, 0, 0], [0, 0, 0], [0, 0, 0]]\".  \nReward: Win: +1, Tie: +0.5, Loss: -1\n\n## Folder structure\n\n    ├── src                     # codes for tic-tac-toe environment\n    │   ├── board.py            # for board\n    │   ├── game.py             # for game\n    │   ├── move.py             # for move\n    │   ├── player.py           # for player\n    │   └── rl.py               # for Q-learning algorithm\n    ├── training                # codes for training\n    │   ├── training_results    # \n    │   │    └── plan1          # training plan1 result files\n    │   ├── training.py         # for training\n    │   ├── result_analysis.py  # for analyzing the training result\n    │   ├── training_result.csv # training result file\n    │   ├── q_table.csv         # Q-table file generated after 100,000 episodes\n    │   └── README.md           #\n    ├── main.py                 # Run the app\n    ├── .gitignore\n    ├── requirements.txt\n    └── README.md\n\n\n## Preparation to use\n1. Clone this project  \n``` git clone https://github.com/taka-rl/tic-tac-toe_q_learning.git``` \n2. If you would like to only play Tic-Tac-Toe, please see \"Play Tic-Tac-Toe\".  \n3. If you would like to train the agent, Run the following command for the libraries:  \n   On Windows type:\n   ```python -m pip install -r requirements.txt```  \n   On MacOS type:\n   ```pip3 install -r requirements.txt```\n\n\n## Play Tic-Tac-Toe\nIf you would like to play tic-tac-toe simply, run main.py.  \nChoose a game mode between 1 and 6.  \n![image](https://github.com/user-attachments/assets/d3f527d9-5600-40a5-b7e0-9ece4d765c8f)\n\n## Training\nIf you would like to train an agent, run `training.py`.   \nPlease refer to this [link](https://github.com/taka-rl/tic-tac-toe_q_learning/tree/main/training/README.md) for more information.\n\n\n## Result\n### Plan 1: Same Parameter Settings with Different Numbers of Episodes  \nTraining environment\n- During training, the agent plays against a computer that makes random moves.\n- Reward setting: win=1, tie=0.5, lose=-1\n- The parameter settings are as follows in config.py:\n```\nCONFIGURATIONS = [\n    Config(learning_rate=0.1, discount_factor=0.9, epsilon=0.1, num_episodes=1000, identifier=\"training1_1\"),\n    Config(learning_rate=0.1, discount_factor=0.9, epsilon=0.1, num_episodes=10000, identifier=\"training1_2\"),\n    Config(learning_rate=0.1, discount_factor=0.9, epsilon=0.1, num_episodes=100000, identifier=\"training1_3\"),\n]\n```\nExpectation: As the number of episodes increases, the average reward is expected to increase.  \nResult: The average reward increased, and the number of wins also rose.  \n\n\nThe average reward was calculated every 100 games.\nThe number of episode is 1000.  \n![image](https://github.com/user-attachments/assets/929e6e52-d8cb-46b2-a3a7-43354cd9d029)\n\nThe number of episode is 10000.  \n![image](https://github.com/user-attachments/assets/98de423d-3650-464b-997d-a556b017a35c)\n\nThe number of episode is 100000.  \n![image](https://github.com/user-attachments/assets/857a8069-309d-4447-a4a1-bd40c267fffb)\n![image](https://github.com/user-attachments/assets/068547f0-fd95-4824-991d-b64a9b122756)\n\nWin/Lose/Tie:  \nThrough the training, the number of win increased, the number of lose decreased gradually.   \n![image](https://github.com/user-attachments/assets/5c3700ef-904b-429e-af85-bf0a1b35ab64)\n\n## Todo\n- Conducting Multi-Agent training\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaka-rl%2Ftic-tac-toe_q_learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaka-rl%2Ftic-tac-toe_q_learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaka-rl%2Ftic-tac-toe_q_learning/lists"}