{"id":19704885,"url":"https://github.com/lsunsi/markovjs","last_synced_at":"2025-08-01T19:36:47.378Z","repository":{"id":57291401,"uuid":"74908818","full_name":"lsunsi/markovjs","owner":"lsunsi","description":"Reinforcement Learning in JavaScript","archived":false,"fork":false,"pushed_at":"2016-12-03T19:35:30.000Z","size":49,"stargazers_count":76,"open_issues_count":0,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-06-28T11:03:22.813Z","etag":null,"topics":["javascript","machine-learning","markov-decision-processes","reinforcement-learning"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lsunsi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-11-27T19:20:11.000Z","updated_at":"2024-11-16T14:10:40.000Z","dependencies_parsed_at":"2022-08-27T16:50:39.956Z","dependency_job_id":null,"html_url":"https://github.com/lsunsi/markovjs","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/lsunsi/markovjs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsunsi%2Fmarkovjs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsunsi%2Fmarkovjs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsunsi%2Fmarkovjs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsunsi%2Fmarkovjs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lsunsi","download_url":"https://codeload.github.com/lsunsi/markovjs/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lsunsi%2Fmarkovjs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267858181,"owners_count":24155917,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","machine-learning","markov-decision-processes","reinforcement-learning"],"created_at":"2024-11-11T21:24:56.241Z","updated_at":"2025-08-01T19:36:47.344Z","avatar_url":"https://github.com/lsunsi.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# markovjs\n\n###### npm install markovjs\n\n[![Release](https://img.shields.io/badge/Release-0.1.3-blue.svg?style=flat-square)](https://github.com/lsunsi/markovjs/releases)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square)](https://github.com/lsunsi/markovjs/blob/master/LICENSE)\n\nThis is a reference implementation of a basic reinforcement learning environment.\nIt is intended as a playground for anyone interested in this field.\n\nMy goal is to provide a minimal and clean implementation of the main concepts, so you can:\n- Plug in some problem you want to try to solve and play around\n- Understand what's going on and how does the agent learn\n- Extend functionality via custom data types or functions\n\nWhat's inside:\n- Basic TD(0) value iteration algorithm\n- Basic memory implementation\n- Common policies\n\n## Getting Started\nThis package exports a function that provides the environment you'll need to try your own problems.\n\nThere are three components required for the learning to start:\n- a game implementation\n- a memory implementation\n- policies for the agent to follow\n\nThe environment provides helpful methods to set those up, train an agent and replay its findings within your game.\nThis example shows a basic usage of this package, and each step will be explained in its own section in order.\n\n```javascript\nimport markov from 'markovjs'\nimport {egreedy} from 'markovjs/policies'\nimport * as memory from 'markovjs/memory'\nimport * as game './game'\n\nconst α = 0.1 // learning rate\nconst γ = 0.9 // discount factor\nconst ε = 0.1 // exploration rate\n\nmarkov() // creates an environment\n  .game(game, game.initial) // sets up the game\n  .memory(memory, memory.init(0.0)) // sets up the memory\n  .policies(egreedy(ε)) // sets up the policies\n  .train(100, α, γ) // train for one hundred episodes\n  .play(episode =\u003e { /* play time! */ })\n```\n\n### **.game** *(game: Game\u003cG\u003e, initialGameState: G)*\n*sets up the game for the learning environment*\n\nIt takes the game **implementation** as its first argument and the game **initial state** as the second one.\nThis initial game state will be used in all game simulations and can only be changed by calling this method again.\n\nThe game implementation should be implemented by you following this interface:\n\n```javascript\n// A: Action type\n// G: Game state type\ntype Game\u003cA, G\u003e = {\n  actions: G=\u003e Array\u003cA\u003e, // what are the allowed actions for given state?\n  act: (G, A) =\u003e G, // what state leads given state taken given action?\n  reward: (G, G) =\u003e number, // what is the reward from going to state from state?\n  final: G=\u003e boolean // is the given state final?\n}\n```\n\nThis is generally all you need to implement in order to use this package.\n\n*That's not to say you shouldn't mess around anywhere else if you feel like it.*\n\n###### tips\n- Need an example? [grid-world](https://github.com/lsunsi/markovjs-gridworld) and n-armed-bandit *(coming soon)*\n- The way you model your problem affects the agent's ability to learn it. State is what your agent sees and the reward is what it seeks!\n- There might be restraints on your state implementation depending on the memory implementation you use. Check out the memory section for more info\n\n### **.memory** *(memory: Memory\u003cM\u003e, initialMemoryState: M)*\n*sets up the memory for the learning environment*\n\nThis method is analogous to `.game`.\nIt takes the memory **implementation** as its first argument and the memory **initial state** as the second one.\n\nThis package provides a basic implementation for the memory that can be used out of the box.\nIt includes both required functions and an extra `init` one, that returns an empty memory state.\nThe `init` function takes a number to be used as the initial value for all unset state-action pairs.\n\n```javascript\nimport * as memory from 'markovjs/memory'\n\nconst m0 = memory.init(0.1) // this means all values are defaulted to 0.1\nconst m1 = memory.update(m0, 0, 1, v =\u003e v + 2.0) // updates the value for G=0 A=1\nconst rater = memory.rater(m1, 0) // gets rater for G=0\nrater(0) // rates G=0 A=0, which gives out 0.1\nrater(1) // rates G=0 A=1, which gives out 2.1\n```\n\n**This memory implementation relies on `toString` method to compare your game states.**\nThis means that for this memory to work correctly, you need to make sure the string returned by `toString` for your game state really represents it.\n\n###### tips\n- You might have to implement a custom `toString` method for your state type. Need an example? [grid-world](https://github.com/lsunsi/markovjs-gridworld)\n- Don't feel like implementing the `toString` method? Check out [this memory implementation](https://github.com/lsunsi/markovjs-immutable)\n- Most of the heavy work is lifted by the memory. Want to speed things up? Roll up your own faster memory implementation!\n\n### **.policies** *(move: Policy, learn: Policy = move, play: Policy = learn)*\n*sets up the policies to be followed by the agent in the learning environment*\n\nIt takes one required policy (move) and two optional ones (learn and play).\nIf one policy is omitted, it is defaulted to the previous one.\nThe policies are used by the agent as follows:\n\n- move: the one followed while learning\n- learn: the one expected to be learned\n- play: the one followed while playing\n\nThis package provides the implementation of the most popular policies used in this type of learning algorithm.\n```javascript\nimport * as policies from 'markovjs/policies'\n\npolicies.random // always chooses random action\npolicies.greedy // always chooses the action with higher expected return\npolicies.egreedy(0.1) // acts random with 0.1 chance and greedy with 0.9 chance\n```\n\n###### tips\n- Use the greedy policy carefully, since it can lead to infinite loops on training or playing\n- If your agent follows and learns the same policy during training, call it [SARSA](https://en.wikipedia.org/wiki/State-Action-Reward-State-Action)\n- If your agent follows one policy while learning the greedy one, call it [Q-Learning](https://en.wikipedia.org/wiki/Q-learning)\n\n### **.train** *(sessions: number, alpha: number, gamma: number)*\n*trains an agent using the game, memory and policies previously set*\n\nIt takes the number of episode **sessions** to train your agent for as its first argument.\nThe second and third ones are the **learning rate** and **discount factor** parameters.\n\nThis method will mutate the environment's memory to reflect the agent's learning.\nHow long it takes for this method to run will depend both on your game's episode length and agent's performance.\n\n*Meaning it will not take forever unless your agent is both really stubborn and really disciplined.*\n\n###### tips\n- Both the **learning rate** and **discount rate** are problem specific.\n- How many sessions it takes to learn the problem? Great question.\n\n### **.play** *(callback: Episode =\u003e void)*\n*generates a playing episode using current game, memory and policy settings*\n\nThe only parameter taken by this function is a callback to pass the resulting episode.\n\nAn episode is a javascript **iterator** of `Transitions`.\n\n```javascript\nexport type Transition\u003cA, G\u003e = {|\n  gameState: G, // state the agent was at\n  action: A, // the action it took\n  nextGameState: G, // where the action led\n  reward: number // what the agent got out of it\n|}\n```\n\n###### tips\n- The episode isn't guaranteed to be finite *(specially if you're agent is too greedy)*\n- The reward sum is what your agent is trying to maximize!\n\n## Going Deeper\nNot satisfied with the included memory implementation?\nWant to try out a custom policy?\nThis training environment is too simple for you?\n\nThis section will expose the main data types and abstractions adopted in this package.\n\n*Let me know if you code something awesome with them.*\n\n### Memory\nThe included memory implementation is supposed to be basic and easy to understand.\nOther implementations might focus on performance or even new functionality.\n\nIf you want to implement your own, here's what you need to code:\n```javascript\n// A: Action type\n// G: Game state type\n// M: Memory type\nexport type Memory\u003cA, G, M\u003e = {\n  update: (M, G, A, number=\u003e number)=\u003e M, // maps memory value for (G, A) pair using given function\n  rater: (M, G) =\u003e (A) =\u003e number // returns a function that rates actions for state G\n}\n```\n\n### Policy\nIf you want to implement your own policies, it is just as easy as writing a simple function.\nYou probably won't need to, since the ones included should get you covered.\n*I sure won't stop you though, so here is the expected signature:*\n\n```javascript\nexport type Policy \u003cA\u003e = (\n  Array \u003cA\u003e, // the array of actions to choose from\n  A=\u003e number // a function that returns the expected return of an action\n) =\u003e A // chosen action\n```\n\n### Misc\nIn order to implement the learning environment I found useful to code these two primitives:\n- **Move**: makes a step from given game state following given policy using given memory state.\n- **Learn**: updates the memory using a 1-step value iteration function, simulating the next move in given game with given policy and memory.\n\nYou might find these functions useful to code your own extensions, so here are their signatures:\n\n```javascript\nexport type Move\u003cA, G, M\u003e = (\n  Game\u003cA, G\u003e,\n  G,\n  Memory\u003cA, G, M\u003e,\n  M,\n  Policy\u003cA\u003e\n)=\u003e Transition\u003cA, G\u003e\n```\n```javascript\nexport type Learn\u003cA, G, M\u003e = (\n  Game\u003cA, G\u003e,\n  Transition\u003cA, G\u003e,\n  Memory\u003cA, G, M\u003e,\n  M,\n  Policy\u003cA\u003e\n)=\u003e M\n```\n\n## What Next\n- [grid-world game example](https://github.com/lsunsi/markovjs-gridworld)\n- [immutable memory implementation](https://github.com/lsunsi/markovjs-immutable)\n\n## Coming Soon\n- n-armed-bandit game example\n- eligibility traces support\n- function approximation support\n\n\n## Thanks\nSeriously, for reading this whole doc.\n\n*You're awesome.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsunsi%2Fmarkovjs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flsunsi%2Fmarkovjs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flsunsi%2Fmarkovjs/lists"}