Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alextanhongpin/go-bandit

Multi-Armed Bandit (MAB) algorithm implementation in go
https://github.com/alextanhongpin/go-bandit

go greedy-epsilon mulit-arm-bandit ucb1

Last synced: about 1 month ago
JSON representation

Multi-Armed Bandit (MAB) algorithm implementation in go

Host: GitHub
URL: https://github.com/alextanhongpin/go-bandit
Owner: alextanhongpin
Created: 2017-10-21T18:15:28.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2019-11-25T13:01:12.000Z (about 5 years ago)
Last Synced: 2024-06-20T10:18:31.210Z (7 months ago)
Topics: go, greedy-epsilon, mulit-arm-bandit, ucb1
Language: Go
Size: 77.1 KB
Stars: 24
Watchers: 3
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        [![](https://godoc.org/github.com/alextanhongpin/go-bandit?status.svg)](http://godoc.org/github.com/alextanhongpin/go-bandit)

# Multi-Armed Bandit Algorithm

Bandit algorithm balances _exploration_ and _exploitation_, and is part of **reinforcement learning*. 

## Reinforcement Learning

In Reinforcement Learning, the system makes a decision based on current situation. Unlike supervised learning, there is no training data available providing the correct decision to the system. We just get a reward back from the environment, indicating the quality of the decison that was made.

Reinforcement learning problems involve the following artiofacts.

- State

- Action or Decision

- Reward

## Implementations

- Random greedy

- Upper confidence bound one

- Upper confidence bound two

- Softmax

- Interval estimate

- Thompson sampling

- Reward comparison

- Action pursuit

- Exponential weight

## TODO

- implement test for the different algorithms

- plots the different algorithm

- pros/cons for each of them

- use cases and examples

## References

- https://www.quora.com/What-is-Thompson-sampling-in-laymans-terms

- https://www.linkedin.com/pulse/dynamic-price-optimization-multi-arm-bandit-pranab-ghosh

- https://pkghosh.wordpress.com/2013/08/25/bandits-know-the-best-product-price/

- https://github.com/pranab/avenir

- https://www.gsb.stanford.edu/sites/gsb/files/mkt_10_17_misra.pdf

- http://alekhagarwal.net/bandits_and_rl/intro.pdf

## Usage

```golang

package main

import (

	"log"

	"math/rand"

	"sync"

	"time"

	bandit "github.com/alextanhongpin/go-bandit"

)

func init() {

	rand.Seed(time.Now().UnixNano())

}

func main() {

	b, err := bandit.NewEpsilonGreedy(0.1, nil, nil)

	if err != nil {

		log.Println(err)

	}

	b.Init(5)

  N := 1000

	var wg sync.WaitGroup

	wg.Add(N)

	for i := 0; i < N; i++ {

		go func() {

			defer wg.Done()

			chosenArm := b.SelectArm(rand.Float64())

			reward := float64(rand.Intn(2))

			b.Update(chosenArm, reward)

		}()

	}

	wg.Wait()

	log.Printf("bandit: %+v", b)

	log.Println("done")

}

```

Test for data race:

```

$ go run -race main.go

```

Output:

```

2018/06/04 23:43:27 bandit: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} Epsilon:0.1 Counts:[233 220 512 19 16] Rewards:[0.4592274678111587 0.48181818181818176 0.5097656249999998 0.3684210526315789 0.25]}

2018/06/04 23:43:27 done

```

## Stats

```

$ brew install tokei

$ tokei

```

Output:

```

-------------------------------------------------------------------------------

 Language            Files        Lines         Code     Comments       Blanks

-------------------------------------------------------------------------------

 Go                     12         1120          879           74          167

 Markdown                1          103          103            0            0

 Python                  1           18           11            2            5

-------------------------------------------------------------------------------

 Total                  15         1241          993           76          172

-------------------------------------------------------------------------------

```