Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alextanhongpin/go-bandit
Multi-Armed Bandit (MAB) algorithm implementation in go
https://github.com/alextanhongpin/go-bandit
go greedy-epsilon mulit-arm-bandit ucb1
Last synced: about 1 month ago
JSON representation
Multi-Armed Bandit (MAB) algorithm implementation in go
- Host: GitHub
- URL: https://github.com/alextanhongpin/go-bandit
- Owner: alextanhongpin
- Created: 2017-10-21T18:15:28.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-11-25T13:01:12.000Z (about 5 years ago)
- Last Synced: 2024-06-20T10:18:31.210Z (7 months ago)
- Topics: go, greedy-epsilon, mulit-arm-bandit, ucb1
- Language: Go
- Size: 77.1 KB
- Stars: 24
- Watchers: 3
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[![](https://godoc.org/github.com/alextanhongpin/go-bandit?status.svg)](http://godoc.org/github.com/alextanhongpin/go-bandit)
# Multi-Armed Bandit Algorithm
Bandit algorithm balances _exploration_ and _exploitation_, and is part of **reinforcement learning*.
## Reinforcement Learning
In Reinforcement Learning, the system makes a decision based on current situation. Unlike supervised learning, there is no training data available providing the correct decision to the system. We just get a reward back from the environment, indicating the quality of the decison that was made.
Reinforcement learning problems involve the following artiofacts.
- State
- Action or Decision
- Reward## Implementations
- Random greedy
- Upper confidence bound one
- Upper confidence bound two
- Softmax
- Interval estimate
- Thompson sampling
- Reward comparison
- Action pursuit
- Exponential weight## TODO
- implement test for the different algorithms
- plots the different algorithm
- pros/cons for each of them
- use cases and examples## References
- https://www.quora.com/What-is-Thompson-sampling-in-laymans-terms
- https://www.linkedin.com/pulse/dynamic-price-optimization-multi-arm-bandit-pranab-ghosh
- https://pkghosh.wordpress.com/2013/08/25/bandits-know-the-best-product-price/
- https://github.com/pranab/avenir
- https://www.gsb.stanford.edu/sites/gsb/files/mkt_10_17_misra.pdf
- http://alekhagarwal.net/bandits_and_rl/intro.pdf## Usage
```golang
package mainimport (
"log"
"math/rand"
"sync"
"time"bandit "github.com/alextanhongpin/go-bandit"
)func init() {
rand.Seed(time.Now().UnixNano())
}func main() {
b, err := bandit.NewEpsilonGreedy(0.1, nil, nil)
if err != nil {
log.Println(err)
}b.Init(5)
N := 1000
var wg sync.WaitGroup
wg.Add(N)
for i := 0; i < N; i++ {
go func() {
defer wg.Done()
chosenArm := b.SelectArm(rand.Float64())
reward := float64(rand.Intn(2))
b.Update(chosenArm, reward)
}()
}wg.Wait()
log.Printf("bandit: %+v", b)
log.Println("done")
}
```Test for data race:
```
$ go run -race main.go
```Output:
```
2018/06/04 23:43:27 bandit: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} Epsilon:0.1 Counts:[233 220 512 19 16] Rewards:[0.4592274678111587 0.48181818181818176 0.5097656249999998 0.3684210526315789 0.25]}
2018/06/04 23:43:27 done
```## Stats
```
$ brew install tokei
$ tokei
```Output:
```
-------------------------------------------------------------------------------
Language Files Lines Code Comments Blanks
-------------------------------------------------------------------------------
Go 12 1120 879 74 167
Markdown 1 103 103 0 0
Python 1 18 11 2 5
-------------------------------------------------------------------------------
Total 15 1241 993 76 172
-------------------------------------------------------------------------------
```