{"id":19729504,"url":"https://github.com/bb4/bb4-q-learning","last_synced_at":"2025-07-20T02:34:25.606Z","repository":{"id":71446026,"uuid":"119262087","full_name":"bb4/bb4-Q-learning","owner":"bb4","description":"A generic Q-Learning with an example Tic-Tac-Toe implementation which uses it","archived":false,"fork":false,"pushed_at":"2023-05-21T15:48:01.000Z","size":772,"stargazers_count":2,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-10T17:41:32.538Z","etag":null,"topics":["ai","q-learning"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bb4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-01-28T13:45:47.000Z","updated_at":"2021-11-01T13:19:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"38a8da2d-b941-4c6f-be23-a3dbc4579175","html_url":"https://github.com/bb4/bb4-Q-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bb4%2Fbb4-Q-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bb4%2Fbb4-Q-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bb4%2Fbb4-Q-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bb4%2Fbb4-Q-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bb4","download_url":"https://codeload.github.com/bb4/bb4-Q-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241052527,"owners_count":19901042,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","q-learning"],"created_at":"2024-11-12T00:12:39.725Z","updated_at":"2025-02-27T19:51:24.252Z","avatar_url":"https://github.com/bb4.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Q-learning\n\n[Q-Learning](https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0) is a strategy for reinforcement learning.\nThis project is a generic Q-learning library that can be applied to a variety of domains.\n \nExample implementations are provided for Tic-Tac-Toe, Frozen Lake, and Finger Chopsticks. \nEach demonstrates how Q-learning can be used to find an optimal strategy to maximize result when the state space is relatively small.\nTic-Tac-Toe, for example, is a simple game with only a few thousand possible states (fewer if you account for symmetry). \nFor domains where the domains are much greater, some sort of approximation to the actual set of states can be used - \nlike a neural net for example.\n\n## How to Run\n\nFrom git-bash, cygwin, cmd, or online IDE shell (such as [codenvy](https://codenvy.io)), do\n```$xslt\ngit clone https://github.com/bb4/bb4-Q-learning.git    (to clone the project repository locally)\ncd bb4-Q-learning                                      (move into the new local project directory)\n./gradlew runTTT                                       (to play Tic-Tac-Toe)\n./gradlew runFrozenLake                                (to run the Frozen Lake demo)\n./gradlew runChopsticks                                (to play finger chopsticks)\n```\n\n## Learn More\n\nSee [my presentation](https://docs.google.com/presentation/d/15X9KhhHxtXNZtxt-GB17prfmXOKu7EvklmZL-nz5yjQ/edit?usp=sharing) to JLHS students\n\n## Results\n\nBelow are some surface plots, created with [Plotly](https://plot.ly/create/?fid=plotly2_demo:140), that show how well the Q-learning models learns in different domains. \nThe axes on the base are for epsilon and the number of learning trials (or episodes). It's clear that more learning trials will yield more accuracy.\nThe epsilon parameter determines the amount of random exploration versus exploitation of knowledge learned so far.\nWhen epsilon is larger, it meas that each transition is more likely to be selected at random - leading to more exploration of the space.\n\n![Tic Tac Toe accuracy](results/ttt-accuracy.JPG)\n\u003cbr\u003eTic Tac Toe learning Accuracy for different values of epsilon and number of trial runs.\n\n![Frozen Lake accuracy](results/large-windy-lake-accuracy.JPG)\n\u003cbr\u003eFrozen Lake learning Accuracy for different values of epsilon and number of trial runs.\n\n![Finger Chopsticks accuracy](results/large-chopsticks-accuracy.png)\n\u003cbr\u003eFinger chopsticks learning Accuracy for different values of epsilon and number of trial runs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbb4%2Fbb4-q-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbb4%2Fbb4-q-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbb4%2Fbb4-q-learning/lists"}