{"id":19446092,"url":"https://github.com/stitchfix/mab","last_synced_at":"2025-07-16T14:37:08.287Z","repository":{"id":41336056,"uuid":"340162521","full_name":"stitchfix/mab","owner":"stitchfix","description":"Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.","archived":false,"fork":false,"pushed_at":"2025-04-08T14:46:16.000Z","size":80,"stargazers_count":54,"open_issues_count":20,"forks_count":7,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-05-09T00:09:48.424Z","etag":null,"topics":["data-science","experimentation","go","golang","multi-armed-bandit","multi-armed-bandits","multiarmed-bandits","reinforcement-learning","thompson","thompson-sampling"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stitchfix.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-02-18T19:57:42.000Z","updated_at":"2025-03-15T03:39:15.000Z","dependencies_parsed_at":"2024-12-27T09:10:39.639Z","dependency_job_id":"d6a2d0c4-c40c-4271-8099-71c6e8e64e14","html_url":"https://github.com/stitchfix/mab","commit_stats":{"total_commits":20,"total_committers":2,"mean_commits":10.0,"dds":0.4,"last_synced_commit":"9ee13de00f64b52d3cf6518de950884343be568c"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/stitchfix/mab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stitchfix%2Fmab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stitchfix%2Fmab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stitchfix%2Fmab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stitchfix%2Fmab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stitchfix","download_url":"https://codeload.github.com/stitchfix/mab/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stitchfix%2Fmab/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265518655,"owners_count":23780997,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","experimentation","go","golang","multi-armed-bandit","multi-armed-bandits","multiarmed-bandits","reinforcement-learning","thompson","thompson-sampling"],"created_at":"2024-11-10T16:12:54.646Z","updated_at":"2025-07-16T14:37:08.229Z","avatar_url":"https://github.com/stitchfix.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Mab\nMulti-Armed Bandits Go Library\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/5180129/108548622-f2df8200-72a0-11eb-8cc2-b4f1e839dffd.png\" width=\"720\"\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\t\u003ca href=\"https://github.com/stitchfix/mab/actions/workflows/go.yml\"\u003e\u003cimg src=\"https://github.com/stitchfix/mab/actions/workflows/go.yml/badge.svg\" alt=\"Build Status\"\u003e\u003c/img\u003e\u003c/a\u003e\n\t\u003ca href=\"https://goreportcard.com/report/github.com/stitchfix/mab\"\u003e\u003cimg src=\"https://goreportcard.com/badge/github.com/stitchfix/mab\" alt=\"Go Report Card\"\u003e\u003c/img\u003e\u003c/a\u003e\n\t\u003ca href=\"https://pkg.go.dev/github.com/stitchfix/mab\"\u003e\u003cimg src=\"https://pkg.go.dev/badge/github.com/stitchfix/mab.svg\" alt=\"Go Reference\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n* [Description](#description)\n* [Installation](#installation)\n* [Usage](#usage)\n  + [Creating a bandit and selecting arms](#bandit)\n  + [Numerical integration with `numint`](#numint)\n* [Documentation](#documentation)\n* [License](#license)\n\n## Description\n\n### What it is\n\nMab is a library/framework for scalable and customizable multi-armed bandits. It provides efficient pseudo-random\nimplementations of epsilon-greedy and Thompson sampling strategies. Arm-selection strategies are decoupled from reward\nmodels, allowing Mab to be used with any reward model whose output can be described as a posterior distribution or point\nestimate for each arm.\n\nMab also provides a numerical one-dimensional integration package, `numint`, which was developed for use by the Mab\nThompson sampler but can also be used as a standalone for numerical integration.\n\n### What it isn't\n\nMab is not concerned with building, training, or updating bandit reward models. It is focused on efficient pseudo-random\narm selection given the output of a reward model.\n\n## Installation\n\n```\ngo get -u github.com/stitchfix/mab\n```\n\n## Usage\n\n### Bandit\n\nA `Bandit` consists of three components: a `RewardSource`, a `Strategy` and a `Sampler`.\nMab provides implementations of each of these, but you are encouraged to implement your own as well!\nEach component is defined by single-method interface, making it relatively simple to fully customize a Mab bandit.\n\nExample:\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\n\t\"github.com/stitchfix/mab\"\n\t\"github.com/stitchfix/mab/numint\"\n)\n\nfunc main() {\n\n\trewards := map[string][]mab.Dist{\n\t\t\"us\": {\n\t\t\tmab.Beta(40, 474),\n\t\t\tmab.Beta(64, 730),\n\t\t\tmab.Beta(71, 818),\n\t\t},\n\t\t\"uk\": {\n\t\t\tmab.Beta(25, 254),\n\t\t\tmab.Beta(100, 430),\n\t\t\tmab.Beta(30, 503),\n\t\t},\n\t}\n\n\tbandit := mab.Bandit{\n\t\tRewardSource: \u0026mab.ContextualRewardStub{rewards},\n\t\tStrategy:     mab.NewThompson(numint.NewQuadrature()),\n\t\tSampler:      mab.NewSha1Sampler(),\n\t}\n\n\tresult, err := bandit.SelectArm(context.Background(), \"user_id:12345\", \"us\")\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tfmt.Println(result)\n}\n```\n\n`SelectArm` will get the reward estimates from the `RewardSource`, compute arm-selection probabilities using\nthe `Strategy` and select an arm using the `Sampler`.\n\nThere is an unfortunate name collision between Go's `context.Context` type and the context a contextual bandit.\nIn Mab, the `context.Context` variables will always be named `ctx`, while the variables used for bandit context will be called `banditContext`.\n\nGo's `context.Context` should be used to pass request-scoped data to the RewardSource, and it is best practice to only use it for cancellation propagation or passing non-controlling data such as request IDs.\n\nThe values needed by the contextual bandit to determine the reward estimates should be passed using the last argument, which is named `banditContext`.\n\nThe `unit` input to `SelectArm` is a string that is used for enabling deterministic outcomes. This is useful for\ndebugging and testing, but can also be used to ensure that users get a consistent experience in between updates to the bandit reward model.\nBandits are expected to always provide the same arm selection for the same set of reward estimates and input unit string.\n\nThe output of `SelectArm` is a struct containing the reward estimates, computed probabilities, and selected arm.\n\n#### RewardSource\n\nA `RewardSource` is expected to provide up-to-date reward estimates for each arm, given some context data.\nMab provides a basic implementation (`HTTPSource`) that can be used for requesting rewards from an HTTP service, and some stubs that can be used for testing and development.\n\n```go\ntype RewardSource interface {\n    GetRewards(context.Context, interface{}) ([]Dist, error)\n}\n```\n\nA typical `RewardSource` implementation is expected to get reward estimates from a database, a cache, or a via HTTP request to a\ndedicated reward service. Since a `RewardSource` is likely to require a call to some external service, the `GetRewards`\nmethod includes a `context.Context`-type argument. This enables Mab bandits to be used in web services that need to pass\nrequest-scoped data such as request timeouts and cancellation propagation. The second argument should be used to pass bandit context data to the reward source.\nThe reward source must return one distribution per arm, conditional on the bandit context.\n\n##### Distributions\n\nReward estimates are represented as a `Dist` for each arm.\n\n```go\ntype Dist interface {\n    CDF(x float64) float64\n    Mean() float64\n    Prob(x float64) float64\n    Rand() float64\n    Support() (float64, float64)\n}\n```\n\nMab includes implementations of beta, normal, and point distributions. The beta and normal distributions wrap and\nextend [gonum](https://github.com/gonum/gonum/tree/master/stat/distuv) implementations, so they are performant and\nreliable.\n\nMab lets your combine any distribution with any strategy, although some combinations don't make sense in practice. \n\nFor epsilon greedy, you will most likely use `Point` distributions, since the algorithm only cares about the mean of the reward estimate.\nOther distributions can be used, as long as they implement a `Mean()` that returns well-defined values.\n\nFor Thompson sampling, it is recommended to use `Normal` or `Beta` distributions. Since Thompson sampling is based on sampling from finite-width distributions, you won't get a useful bandit by using `Point` distributions with the `Thompson` strategy.\n\nThe `Null()` function returns a `Point` distribution at negative infinity (`math.Inf(-1)`). This indicates to the `Strategy` that this arm should never be selected. Each `Strategy` must account for any number of Null distributions and return zero probability for the null arms and the correct set of probabilities for the non-null arms, as if the null arms were not present.\n\n#### Strategy\n\nA Mab `Strategy` computes arm-selection probabilities from the set of reward estimates.\n\nMab provides the following strategies:\n\n- Thompson sampling (`mab.Thompson`)\n- Epsilon-greedy (`mab.EpsilonGreedy`)\n- Proportional (`mab.Proportional`)\n\nMab also provides a Monte-Carlo based Thompson-sampling strategy (`mab.ThompsonMC`) but it is much slower an less accurate than `mab.Thompson`, which is based on numerical integration. It is not recommended to use `ThompsonMC` in production.\n\n##### Thompson sampling\n\nThe Thompson sampling strategy computes arm-selection probabilities using the following formula:\n\n![thompson sampling formula](https://user-images.githubusercontent.com/5180129/108559544-4a391e80-72b0-11eb-825c-483aba3dcd18.png)\n\nThat is, the probability of selecting an arm under Thompson sampling is the integral of that arm's posterior\nPDF times the posterior CDFs of all other arms. The derivation of this formula is left as an exercise for the reader.\n\nComputing these probabilities requires one-dimensional integration, which is provided by the `numint` subpackage.\n\nThe limits of integration are determined by the `Support` of the arms' distribution, so `Point` distributions will always get zero probability using Thompson sampling.\n\n##### Epsilon-greedy\n\nThis is the basic epsilon-greedy selection strategy. The probability of selecting an arm under epsilon greedy is readily\ncomputed from a closed-form solution without the need for numerical integration. It is based on the `Mean` of the reward estimate.\n\n##### Proportional\n\nThe proportional sampler computes arm selection probabilities proportional to some input weights. This is not a real\nbandit strategy, but exists to allow users to effectively shift the interface between reward sources and bandit\nstrategies. You can create a `RewardSource` that returns the desired selection weights as `Point` distributions and then\nuse the `Proportional` strategy to make sure that the sampler uses the normalized weights as the probability distribution for arm selection.\n\n#### Sampler\n\nA Mab `Sampler` selects an arm given the set of selection probabilities and a string. The default sampler implementation\nuses the SHA1 hash of the input string (mod 1000) to determine the arm.\n\n### Numint\n\nThe Thompson sampling strategy depends on an integrator for computing probabilities.\n\n```go\ntype Integrator interface {\n    Integrate(f func (float64) float64, a, b float64) (float64, error)\n}\n```\n\nThe `numint` package provides a quadrature-based implementation that can be used for Thompson sampling. It can be used\neffectively with just the default settings, or can be fully customized by the user.\n\nThe default quadrature rule and other parameters for the `numint` quadrature integrator have been found through\ntrial-and-error to provide a good tradeoff between speed and reliability for a wide range of inputs including many\ncombinations of normal and beta distributions.\n\nSee the `numint` README and documentation for more details.\n\n## Documentation\n\nMore detailed refence docs can be found on [pkg.go.dev](https://pkg.go.dev/github.com/stitchfix/mab)\n\n## License\n\nMab and Numint are licensed under the Apache 2.0 license. See the LICENSE file for terms and conditions for use, reproduction, and\ndistribution.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstitchfix%2Fmab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstitchfix%2Fmab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstitchfix%2Fmab/lists"}