{"id":23080627,"url":"https://github.com/chen0040/java-reinforcement-learning","last_synced_at":"2025-08-15T22:31:04.752Z","repository":{"id":85884476,"uuid":"90450297","full_name":"chen0040/java-reinforcement-learning","owner":"chen0040","description":"Package provides java implementation of reinforcement learning algorithms such Q-Learn, R-Learn, SARSA, Actor-Critic","archived":false,"fork":false,"pushed_at":"2019-05-18T16:31:13.000Z","size":158,"stargazers_count":112,"open_issues_count":7,"forks_count":41,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-02-02T17:53:02.548Z","etag":null,"topics":["actor-critic","java","q-learning","reinforcement-learning","sarsa","sarsa-lambda"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chen0040.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-05-06T08:54:13.000Z","updated_at":"2024-01-08T07:18:45.000Z","dependencies_parsed_at":"2023-03-13T07:18:50.018Z","dependency_job_id":null,"html_url":"https://github.com/chen0040/java-reinforcement-learning","commit_stats":{"total_commits":25,"total_committers":1,"mean_commits":25.0,"dds":0.0,"last_synced_commit":"f85cb03e5d16512f6bb9e126fa940b9e49d5bde7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fjava-reinforcement-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fjava-reinforcement-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fjava-reinforcement-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chen0040%2Fjava-reinforcement-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chen0040","download_url":"https://codeload.github.com/chen0040/java-reinforcement-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229964387,"owners_count":18152034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-critic","java","q-learning","reinforcement-learning","sarsa","sarsa-lambda"],"created_at":"2024-12-16T13:15:53.737Z","updated_at":"2024-12-16T13:15:54.398Z","avatar_url":"https://github.com/chen0040.png","language":"Java","readme":"# java-reinforcement-learning\nPackage provides java implementation of reinforcement learning algorithms as described in the book \"Reinforcement Learning: An Introduction\" by Sutton\n\n[![Build Status](https://travis-ci.org/chen0040/java-reinforcement-learning.svg?branch=master)](https://travis-ci.org/chen0040/java-reinforcement-learning) [![Coverage Status](https://coveralls.io/repos/github/chen0040/java-reinforcement-learning/badge.svg?branch=master)](https://coveralls.io/github/chen0040/java-reinforcement-learning?branch=master)\n\n\n# Features\n\nThe following reinforcement learning are implemented:\n\n* R-Learn\n* Q-Learn\n* Q-Learn with eligibility trace\n* SARSA\n* SARSA with eligibility trace\n* Actor-Critic\n* Actor-Critic with eligibility trace\n\nThe package also support a number of action-selection strategy:\n\n* soft-max\n* epsilon-greedy\n* greedy\n* Gibbs-soft-max\n\n\n![Reinforcement Learning](images/rl.jpg)\n\n# Install\n\nAdd the following dependency to your POM file:\n\n```\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.github.chen0040\u003c/groupId\u003e\n  \u003cartifactId\u003ejava-reinforcement-learning\u003c/artifactId\u003e\n  \u003cversion\u003e1.0.5\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n# Application Samples\n\nThe application sample of this library can be found in the following repositories:\n\n* [java-reinforcement-learning-tic-tac-toe](https://github.com/chen0040/java-reinforcement-learning-tic-tac-toe)\n* [java-reinforcement-learning-flappy-bird](https://github.com/chen0040/java-reinforcement-learning-flappy-bird) \n\n# Usage\n\n### Create Agent\n\nAn reinforcement agent, say, Q-Learn agent, can be created by the following java code:\n\n```java\nimport com.github.chen0040.rl.learning.qlearn.QAgent;\n\nint stateCount = 100;\nint actionCount = 10;\nQAgent agent = new QAgent(stateCount, actionCount);\n```\n\nThe agent created has a state map of 100 states, and 10 different actions for its selection.\n\nFor Q-Learn and SARSA, the eligibility trace lambda can be enabled by calling:\n\n```java\nagent.enableEligibilityTrace(lambda)\n```\n\n### Select Action\n\nAt each time step, a action can be selected by the agent, by calling:\n\n```java\nint actionId = agent.selectAction().getIndex();\n```\n\nIf you want to limits the number of possible action at each states (say the problem restrict the actions avaliable at different state), then call:\n\n```java\nSet\u003cInteger\u003e actionsAvailableAtCurrentState = world.getActionsAvailable(agent);\nint actionTaken = agent.selectAction(actionsAvailableAtCurrentState).getIndex();\n```\n\nThe agent can also change to a different action-selection policy available in com.github.chen0040.rl.actionselection package, for example, the following code\nswitch the action selection policy to soft-max:\n\n```java\nagent.getLearner().setActionSelection(SoftMaxActionSelectionStrategy.class.getCanonicalName());\n```\n\n### State-Action Update\n\nOnce the world state has been updated due to the agent's selected action, its internal state-action Q matrix will be updated by calling:\n\n```java\nint newStateId = world.update(agent, actionTaken);\ndouble reward = world.reward(agent);\n\nagent.update(actionTaken, newStateId, reward);\n```\n\n# Sample code\n\n### Sample code for R-Learn\n\n```java\nimport com.github.chen0040.rl.learning.rlearn.RAgent;\n\nint stateCount = 100;\nint actionCount = 10;\nRAgent agent = new RAgent(stateCount, actionCount);\n\nRandom random = new Random();\nagent.start(random.nextInt(stateCount));\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction().getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n\n agent.update(actionId, newStateId, reward);\n}\n```\n\nAlternatively, you can use RLearner if you want to learning after the episode:\n\n```java\n\nclass Move {\n    int oldState;\n    int newState;\n    int action;\n    double reward;\n    \n    public Move(int oldState, int action, int newState, double reward) {\n        this.oldState = oldState;\n        this.newState = newState;\n        this.reward = reward;\n        this.action = action;\n    }\n}\n\nint stateCount = 100;\nint actionCount = 10;\nRLearner agent = new RLearner(stateCount, actionCount);\n\nRandom random = new Random();\nint currentState = random.nextInt(stateCount));\nList\u003cTupleThree\u003cInteger, Integer, Double\u003e\u003e moves = new ArrayList\u003c\u003e();\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction(currentState).getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n int oldStateId = currentState;\n moves.add(new Move(oldStateId, actionId, newStateId, reward));\n  currentState = newStateId;\n}\n\nfor(int i=moves.size()-1; i \u003e= 0; --i){\n    Move move = moves.get(i);\n    agent.update(move.oldState, move.action, move.newState, world.getActionsAvailableAtState(nextStateId), move.reward);\n}\n\n```\n\n### Sample code for Q-Learn\n\n```java\nimport com.github.chen0040.rl.learning.qlearn.QAgent;\n\nint stateCount = 100;\nint actionCount = 10;\nQAgent agent = new QAgent(stateCount, actionCount);\n\nRandom random = new Random();\nagent.start(random.nextInt(stateCount));\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction().getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n\n agent.update(actionId, newStateId, reward);\n}\n```\n\nAlternatively, you can use QLearner if you want to learning after the episode:\n\n```java\n\nclass Move {\n    int oldState;\n    int newState;\n    int action;\n    double reward;\n    \n    public Move(int oldState, int action, int newState, double reward) {\n        this.oldState = oldState;\n        this.newState = newState;\n        this.reward = reward;\n        this.action = action;\n    }\n}\n\nint stateCount = 100;\nint actionCount = 10;\nQLearner agent = new QLearner(stateCount, actionCount);\n\nRandom random = new Random();\nint currentState = random.nextInt(stateCount));\nList\u003cTupleThree\u003cInteger, Integer, Double\u003e\u003e moves = new ArrayList\u003c\u003e();\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction(currentState).getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n int oldStateId = currentState;\n moves.add(new Move(oldStateId, actionId, newStateId, reward));\n  currentState = newStateId;\n}\n\nfor(int i=moves.size()-1; i \u003e= 0; --i){\n    Move move = moves.get(i);\n    agent.update(move.oldState, move.action, move.newState, move.reward);\n}\n\n```\n\n### Sample code for SARSA\n\n```java\nimport com.github.chen0040.rl.learning.sarsa.SarsaAgent;\n\nint stateCount = 100;\nint actionCount = 10;\nSarsaAgent agent = new SarsaAgent(stateCount, actionCount);\n\nRandom random = new Random();\nagent.start(random.nextInt(stateCount));\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction().getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n\n agent.update(actionId, newStateId, reward);\n}\n```\n\nAlternatively, you can use SarsaLearner if you want to learning after the episode:\n\n```java\n\nclass Move {\n    int oldState;\n    int newState;\n    int action;\n    double reward;\n    \n    public Move(int oldState, int action, int newState, double reward) {\n        this.oldState = oldState;\n        this.newState = newState;\n        this.reward = reward;\n        this.action = action;\n    }\n}\n\nint stateCount = 100;\nint actionCount = 10;\nSarsaLearner agent = new SarsaLearner(stateCount, actionCount);\n\nRandom random = new Random();\nint currentState = random.nextInt(stateCount));\nList\u003cTupleThree\u003cInteger, Integer, Double\u003e\u003e moves = new ArrayList\u003c\u003e();\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction(currentState).getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n int oldStateId = currentState;\n moves.add(new Move(oldStateId, actionId, newStateId, reward));\n  currentState = newStateId;\n}\n\nfor(int i=moves.size()-1; i \u003e= 0; --i){\n    Move next_move = moves.get(i);\n    if(i != moves.size()-1) {\n        next_move = moves.get(i+1);\n    }\n    Move current_move = moves.get(i);\n    agent.update(current_move.oldState, current_move.action, current_move.newState, next_move.action, current_move.reward);\n}\n```\n\n### Sample code for Actor Critic Model\n\n```java\nimport com.github.chen0040.rl.learning.actorcritic.ActorCriticAgent;\nimport com.github.chen0040.rl.utils.Vec;\n\nint stateCount = 100;\nint actionCount = 10;\nActorCriticAgent agent = new ActorCriticAgent(stateCount, actionCount);\nVec stateValues = new Vec(stateCount);\n\nRandom random = new Random();\nagent.start(random.nextInt(stateCount));\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction().getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n\n \n System.out.println(\"World state values changed ...\");\n for(int stateId = 0; stateId \u003c stateCount; ++stateId){\n    stateValues.set(stateId, random.nextDouble());\n }\n    \n agent.update(actionId, newStateId, reward, stateValues);\n}\n```\n\nAlternatively, you can use ActorCriticLearner if you want to learning after the episode:\n\n```java\n\nclass Move {\n    int oldState;\n    int newState;\n    int action;\n    double reward;\n    \n    public Move(int oldState, int action, int newState, double reward) {\n        this.oldState = oldState;\n        this.newState = newState;\n        this.reward = reward;\n        this.action = action;\n    }\n}\n\nint stateCount = 100;\nint actionCount = 10;\nSarsaLearner agent = new SarsaLearner(stateCount, actionCount);\n\nRandom random = new Random();\nint currentState = random.nextInt(stateCount));\nList\u003cTupleThree\u003cInteger, Integer, Double\u003e\u003e moves = new ArrayList\u003c\u003e();\nfor(int time=0; time \u003c 1000; ++time){\n\n int actionId = agent.selectAction(currentState).getIndex();\n System.out.println(\"Agent does action-\"+actionId);\n \n int newStateId = world.update(agent, actionId);\n double reward = world.reward(agent);\n\n System.out.println(\"Now the new state is \" + newStateId);\n System.out.println(\"Agent receives Reward = \"+reward);\n int oldStateId = currentState;\n moves.add(new Move(oldStateId, actionId, newStateId, reward));\n  currentState = newStateId;\n}\n\nfor(int i=moves.size()-1; i \u003e= 0; --i){\n    Move next_move = moves.get(i);\n    if(i != moves.size()-1) {\n        next_move = moves.get(i+1);\n    }\n    Move current_move = moves.get(i);\n    agent.update(current_move.oldState, current_move.action, current_move.newState, next_move.action, current_move.reward);\n}\n\n```\n\n### Save and Load RL models\n\nTo save the trained RL model (say QLeanrer):\n\n```java\nQLearner learner = new QLearner(stateCount, actionCount);\ntrain(learner);\nString json = learner.toJson();\n```\n\nTo load the trained RL model from json:\n\n```java\nQLearner learner = QLearn.fromJson(json);\n```\n\n","funding_links":[],"categories":["人工智能"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchen0040%2Fjava-reinforcement-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchen0040%2Fjava-reinforcement-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchen0040%2Fjava-reinforcement-learning/lists"}