{"id":28504700,"url":"https://github.com/elixir-code/menacer","last_synced_at":"2026-04-29T10:31:31.270Z","repository":{"id":140245435,"uuid":"163082245","full_name":"elixir-code/MENACER","owner":"elixir-code","description":"Machine Educable Noughts and Crosses Engine - Revived","archived":false,"fork":false,"pushed_at":"2019-01-19T17:58:55.000Z","size":260,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-06T04:37:35.002Z","etag":null,"topics":["markov-decision-processes","menace","reinforcement-learning","value-iteration-algorithm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elixir-code.png","metadata":{"files":{"readme":"docs/README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-12-25T12:48:35.000Z","updated_at":"2019-02-27T10:57:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"532d1373-e99a-4052-b031-145893894696","html_url":"https://github.com/elixir-code/MENACER","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/elixir-code/MENACER","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-code%2FMENACER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-code%2FMENACER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-code%2FMENACER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-code%2FMENACER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elixir-code","download_url":"https://codeload.github.com/elixir-code/MENACER/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-code%2FMENACER/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32421511,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["markov-decision-processes","menace","reinforcement-learning","value-iteration-algorithm"],"created_at":"2025-06-08T18:30:34.708Z","updated_at":"2026-04-29T10:31:31.265Z","avatar_url":"https://github.com/elixir-code.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"==============================================================\nMENACER: Machine Educable Noughts and Crosses Engine - Revived\n==============================================================\n\n.. From Layman's Perspective\n\n**MENACER** (``Machine Educable Noughts and Crosses Engine - Revived``) is a computer program that plays the game of Noughts and Crosses (aka. Tic-Tac-Toe). It learns, evolves and gets better at the game with every game it plays.\n\nAPI Examples\n============\n\n----------------\nMENACER vs Human\n----------------\n\n..\tcode:: python\n\t\n\tfrom menacer import AgentX, AgentO, playNoughtsCrosses\n\n\t# Initialize MDP Agent that plays 'X' with random policy\n\tagentx = AgentX()\n\tagento = 'human'\n\n\twhile True:\n\t\t# Simulate a game between agentx and agento\n\t\tgame = playNoughtsCrosses(agentx, agento)\n\n\t\t# Update the MDP and policy of agentx to learn from game\n\t\tagentx.learnGameplay([game])\n\n------------------\nMENACER vs MENACER\n------------------\n\n..\tcode:: python\n\n\tfrom menacer import AgentX, AgentO, playNoughtsCrosses\n\n\t# Initialize MDP Agents that plays 'X' and 'O' with random policies\n\tagentx = AgentX()\n\tagento = AgentO()\n\n\t# Define the number games to be simulated\n\tn_games = 1000\n\n\tfor i_game in range(n_games):\n\n\t\t# Simulate a game between agentx and agento\n\t\tgame = playNoughtsCrosses(agentx, agento)\n\n\t\t# Update the MDP and policy of agents to learn from game\n\t\tagentx.learnGameplay([game])\n\t\tagento.learnGameplay([game])\n\n--------------------------------------\nDumping and Loading Pre-Trained Agents\n--------------------------------------\n\nPre-trained agents can be *serialized*, and dumped or loaded from binary files using the `pickle \u003chttps://docs.python.org/3/library/pickle.html\u003e`_ library.\n\nHow it Works\n============\n\n.. From Reinforcement Learning Perspective\n\nMENACER is a simple **reinforcement learning** (RL) agent that uses **Markov Decision Process (MDP)** model to capture the dynamics of the game, and **value iteration** algorithm to determine the probabilistically optimum move to play for every possible configuration of the board.\n\nMENACER employs two seperate **Markov Decision Process (MDP)** models to learn the dynamics of gameplay for agents that play the Noughts ('X') and the Crosses ('O') respectively.\n\n..\tcontents:: Markov Decision Process (MDP) model\n\t:local:\n\n-------------------\nRepresenting States\n-------------------\n\nThe various possible configurations of the *Noughts and Crosses* board correspond to the states in the MDP models. \n\nThe MDP model of an agent (either the agent that plays *Noughts* or the agent that plays *Crosses*) only involves states where **the agent plays the next move**, along with states that correspond to the **won, lost and drawn** configurations of the board.\n\n..\timage:: static-assets/board.png\n\t:align: left\n\nThe board configurations can be represented in two forms:\n\n+ \t**String Representation:** a 9-character string composed of '**x**' (noughts), '**o**' (crosses) and '**.**' (empty space).\n\t**Example:** '.ooxo..x.'\n\n.. _`array representation`:\n\n+ \t**Array Representation:** a 9-element array composed of '1' (noughts), '0' (empty space) and '-1' (crosses).\n\t**Example:** [0, -1, -1, 1, -1, 0, 0, 1, 0]\n\nThe string notation can be conveniently used as keys in the hashing data structures used to store the policy, transition probabilities and rewards for the states in the MDP model.\n\nStandard Form of a State\n------------------------\n\nThe next move to play for a given board configuration is symmetric (or identical) for *rotated and/or mirrored* configurations of the board. The board configurations can be represented in a rotation and mirror-invariant form called the **standard form**.\n\n\tThe **standard form** of a board configuration is the board configuration which has the lexicographically largest `array representation`_ among the eight possible rotated and/or mirrored configurations of the given board.\n\n..  image:: static-assets/standard-form.png\n..\n\n\nThe use of the standard form of the board configurations to represent the states in the MDP models drastically reduce the number of possible states and improve the learning capability of the agent.\n\n--------------------\nRepresenting Actions\n--------------------\n\n.. \timage:: static-assets/board-actions.png\n\t:align: left\n\n.. End of image directive\n\nThe various *next moves* (or positions) that an agent can play for a given board configuration correspond to **actions** that can be performed at the corresponding state in the MDP model.\n\nThe possible actions that can be performed at a given state in the MDP models are encoded as a *subset of numbers enumerated from 0 to 8*, each corresponding to one of the nine possible positions in the board.\n\n\nContributor's Section\n=====================\n\nThe MENACER community encouages all its members to contribute to the project in however small ways possible.\n\nSome of the important milestones in the future roadmap of MENACER include:\n\n+ **Creation of Website:** Since, large hours of training are necessary for the agents to capture the complete dynamics of the game and evolve to become expert players in the game of Noughts and Crosses, creation of a website where the users can play against MENACER is of primal focus.\n  \nPlease refer to `issues \u003chttps://github.com/elixir-code/MENACER/issues\u003e`_ section for related discussion and more information on possible directions of future work.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felixir-code%2Fmenacer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felixir-code%2Fmenacer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felixir-code%2Fmenacer/lists"}