{"id":21289301,"url":"https://github.com/agnivchtj/reinforcement-learning","last_synced_at":"2025-03-15T15:45:00.833Z","repository":{"id":102959631,"uuid":"284764897","full_name":"agnivchtj/Reinforcement-Learning","owner":"agnivchtj","description":"Program an agent to learn the optimal route through a maze using reward-based action selection","archived":false,"fork":false,"pushed_at":"2020-08-03T18:08:09.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-22T05:43:15.312Z","etag":null,"topics":["java","model-free","q-learning-algorithm","reinforcement-learning","reward-based-learning"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agnivchtj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-03T17:33:35.000Z","updated_at":"2020-11-19T10:58:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"9815f40e-8e37-4217-88d4-f8866ab5cd3b","html_url":"https://github.com/agnivchtj/Reinforcement-Learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agnivchtj%2FReinforcement-Learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agnivchtj%2FReinforcement-Learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agnivchtj%2FReinforcement-Learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agnivchtj%2FReinforcement-Learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agnivchtj","download_url":"https://codeload.github.com/agnivchtj/Reinforcement-Learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243754009,"owners_count":20342537,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","model-free","q-learning-algorithm","reinforcement-learning","reward-based-learning"],"created_at":"2024-11-21T12:38:35.369Z","updated_at":"2025-03-15T15:45:00.803Z","avatar_url":"https://github.com/agnivchtj.png","language":"Java","readme":"# Objective\n\nThe goal of the project is to program an agent to learn a route, through a maze, based on rewards. \nThe target location is a rewarded location in that maze, and the agent will learn to navigate from a fixed start to this rewarded target via exploration and repetition. \n\nFor this, the approach taken is based on Reinforcement Learning (RL). The aim is to implement a model-free off-policy version of RL called Q-Learning, using a tabular representation for `Q(s,a)`, i.e., a matrix with two dimensions S and A; S being all possible states (the locations in the maze), and A being all actions possible in each state (up, down, left, right).  \nA value in the matrix at location `(s,a)` represents the `Q(s,a)` value, i.e., the value of an action a in state s.\n\nEssentially, the problem to solve boils down to learning a `S x A` matrix of values with each value representing the utility of an s,a pair. \nEvery step the agent takes updates one such value, namely the last visited `(s,a)` pair. As Q-learning is used, and Q-learning is off-policy (thus, updates are not based on actual actions chosen), we will update this value using the value of the best possible next action `Q(s',a.max)`. \n\nThe full update rule is:\n\n`Q(s,a).new = Q(s,a).old + a(r + yQ.max(s',a.max) - Q(s,a).old)` \n\nWhere `Q.max(s',a.max)` is the Q-value of the best action so far in state `s'`.\n\nThe agent learns by taking random actions, and after each action updating `Q(s,a)`.  I will experiment with action selection (the policy) and it will be implemented with the `e`-Greedy algorithm. \nThis method will take either a random action with a probability of `e` (exploration) or a greedy action with a probability of `1 - e` (exploitation). A random action is selected out of the 4 possible actions while a greedy action is an action with the highest predicted value so far, so it chooses the action with the highest `Q(s,a)` for the states we are now in.\nTypically we find that with a high `e`-value, though the expected reward on a step might be lower, the total reward in the long-term is greater.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagnivchtj%2Freinforcement-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagnivchtj%2Freinforcement-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagnivchtj%2Freinforcement-learning/lists"}