https://github.com/datawraith/banditbench

A small benchmark of Multi-armed bandit algorithms for the Bernoulli bandit
https://github.com/datawraith/banditbench
Last synced: 9 months ago
JSON representation
A small benchmark of Multi-armed bandit algorithms for the Bernoulli bandit
Host: GitHub
URL: https://github.com/datawraith/banditbench
Owner: DataWraith
License: mit
Created: 2024-03-02T15:32:23.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-06-22T07:15:01.000Z (12 months ago)
Last Synced: 2025-06-22T08:18:35.822Z (12 months ago)
Language: Rust
Size: 503 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Bandit Bench

[![No Maintenance Intended](http://unmaintained.tech/badge.svg)](http://unmaintained.tech/)

This project is a small, unscientific benchmark of algorithms for the Bernoulli

Multi-Armed Bandit. It benchmarks my specific use-case of short-horizon problems

(500 arm pulls) with Bernoulli rewards (i.e., either there is a reward or there

is not, with a given probability).

Algorithms are only included in the benchmark if

- They are easy to implement

- They do not depend on the time horizon explicitly

- They do not need (much) parameter tuning

## Algorithms

### Baselines

- Random Baseline (chooses arms randomly)

- Greedy Baseline (chooses the arm with the maximum average reward)

- ϵ-Greedy

- ϵ-Decreasing

- Explore Then Commit

- Least Failures (then most successes, then random)

### Bootstrap-based

- [Bootstrapped Thompson Sampling](https://arxiv.org/abs/1410.4009)

- [Garbage In, Reward Out](http://proceedings.mlr.press/v97/kveton19a/kveton19a.pdf) (PDF)

- [MARS](https://proceedings.neurips.cc/paper_files/paper/2023/hash/b84adff45775e92a45f0cd87c37f5ce9-Abstract-Conference.html)

- [Multiplier Bootstrap-based Exploration](https://arxiv.org/abs/2302.01543)

- [Perturbed-History Exploration](https://arxiv.org/abs/1902.10089)

- [ReBoot](https://arxiv.org/abs/2002.08436)

- Vanilla Residual Bootstrap

- [WB](https://arxiv.org/abs/1805.09793)

### Dueling-based

- [Bounded Dirichlet Sampling](https://arxiv.org/abs/2111.09724)

- [WR-SDA](https://arxiv.org/abs/2010.14323)

### Approximations of the Gittins Index

- Whittle's approximation (works best for large simulation horizon)

- Brezzi and Lai's approximation

### Thompson Sampling-based

- [ϵ-Exploring Thompson Sampling](https://proceedings.mlr.press/v202/jin23b/jin23b.pdf) (PDF)

- [Information Relaxation Sampling (Finite Horizon)](https://arxiv.org/abs/1902.04251) (simplified version that uses an assumed, fixed horizon)

- [Non-Parametric Thompson Sampling](https://proceedings.mlr.press/v117/riou20a.html)

- Optimistic Thompson Sampling

- [Satisficing Thompson Sampling](https://arxiv.org/abs/1704.09028)

- [Thompson Sampling with Virtual Helping Agents (C3)](https://arxiv.org/abs/2209.08197)

- Thompson Sampling

### Upper Confidence Bound-based

- [BayesUCB](https://arxiv.org/abs/2306.09136)

- [Hellinger-UCB](https://arxiv.org/abs/2404.10207)

- [KL-UCB](https://arxiv.org/abs/1102.2490)

- [lil' UCB](https://arxiv.org/abs/1312.7308)

- [MOSS-anytime](http://proceedings.mlr.press/v48/degenne16.html)

- [RAVEN-UCB](https://arxiv.org/abs/2506.02933)

- [ReUCB](https://arxiv.org/abs/2106.12200)

- [UCB1](https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf) (PDF)

- [UCB1-Tuned](https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf) (PDF)

- [UCB-DT](https://arxiv.org/abs/2110.02690)

- [UCBT](https://arxiv.org/abs/2102.05263)

### Other

- [Batch Ensemble for MAB](https://arxiv.org/abs/2409.08570)

- [Boltzmann-Gumbel Exploration](https://arxiv.org/abs/1705.10257)

- [CODE](https://arxiv.org/abs/2310.14751)

- [EB-TCI](https://arxiv.org/abs/2206.05979)

- [EXP-IX](https://arxiv.org/abs/1506.03271)

- [Forced Exploration](https://arxiv.org/abs/2312.07285)

- [Gradient Bandit](https://arxiv.org/abs/2402.17235)

- [Kullback-Leibler Maillard Sampling](https://arxiv.org/abs/2304.14989)

- [POKER](https://link.springer.com/chapter/10.1007/11564096_42) (with fixed Horizon)

- [Risk-sensitive Satisficing (RS)](https://doi.org/10.1016/j.biosystems.2019.02.009)

- [SoftElim](https://arxiv.org/abs/2002.06772)

- [Softsatisficing](https://doi.org/10.1016/j.biosystems.2022.104633)

- [Tsallis-INF](https://arxiv.org/abs/1807.07623)

- [TS-UCB](https://arxiv.org/abs/2006.06372)

## Results

The following table shows the average rank and runtime of each algorithm when

considering the five experiments further down in this file.

| Algorithm                                                   | Average Rank | Average Time (seconds) |

| ----------------------------------------------------------- | ------------ | ---------------------- |

| Batch Ensemble for MAB (m=0)                                | 15.2         | 0.1                    |

| IRS.FH (H=2)                                                | 16.4         | 1.64                   |

| UCB-DT (γ=0.90)                                             | 18.6         | 3.61                   |

| IRS.FH (H=3)                                                | 18.8         | 1.71                   |

| UCB-DT (γ=0.95)                                             | 19.2         | 3.44                   |

| IRS.FH (H=1)                                                | 19.8         | 1.52                   |

| UCB-DT (γ=0.75)                                             | 21.2         | 3.19                   |

| TS-UCB (100 samples)                                        | 23.6         | 92.48                  |

| UCB-DT (γ=1.00)                                             | 23.8         | 3.23                   |

| ϵ-Exploring TS-UCB (1 samples)                              | 26.0         | 0.23                   |

| Batch Ensemble for MAB (m=1)                                | 27.0         | 0.12                   |

| ϵ-Exploring TS-UCB (10 samples)                             | 27.0         | 1.12                   |

| IRS.FH (H=4)                                                | 27.2         | 1.73                   |

| ϵ-Exploring TS-UCB (100 samples)                            | 27.2         | 8.87                   |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.99)    | 27.6         | 0.52                   |

| SoftElim (θ=0.01)                                           | 29.6         | 0.44                   |

| Gittins Index -- Whittle's Approximation (β=0.99)           | 29.8         | 0.29                   |

| MOSS-Anytime (α=-0.85)                                      | 30.4         | 0.37                   |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.1)                         | 30.4         | 0.25                   |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.5)                         | 30.8         | 0.24                   |

| TS-UCB (10 samples)                                         | 32.2         | 9.26                   |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.001)                       | 32.6         | 0.2                    |

| Gittins Index -- Whittle's Approximation (β=0.90)           | 35.8         | 0.28                   |

| IRS.FH (H=5)                                                | 35.8         | 1.7                    |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.5)                          | 36.4         | 0.23                   |

| BayesUCB (δ=0.300)                                          | 37.2         | 0.28                   |

| ϵ-Decreasing (ϵ=0.990)                                      | 38.0         | 0.21                   |

| POKER (H=5)                                                 | 38.4         | 0.37                   |

| POKER (H=10)                                                | 38.6         | 0.37                   |

| ReUCB (a=2.00)                                              | 39.2         | 1.2                    |

| BayesUCB (δ=0.400)                                          | 39.4         | 0.28                   |

| Greedy                                                      | 39.4         | 0.13                   |

| BayesUCB (δ=0.200)                                          | 39.6         | 0.28                   |

| POKER (H=1)                                                 | 39.6         | 0.35                   |

| ReUCB (a=1.50)                                              | 39.8         | 1.32                   |

| ϵ-Decreasing (ϵ=0.900)                                      | 40.0         | 0.21                   |

| ReUCB (a=1.00)                                              | 40.6         | 1.33                   |

| CODE (δ=0.990)                                              | 41.2         | 0.46                   |

| ϵ-Decreasing (ϵ=0.700)                                      | 41.4         | 0.22                   |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.1)                          | 42.4         | 0.23                   |

| MOSS-Anytime (α=-0.50)                                      | 43.2         | 0.4                    |

| Thompson Sampling with Virtual Helping Agents (Combiner C3) | 43.4         | 18.74                  |

| ϵ-Greedy (ϵ=0.010)                                          | 44.4         | 0.14                   |

| ϵ-Greedy (ϵ=0.020)                                          | 44.8         | 0.17                   |

| SoftElim (θ=0.10)                                           | 45.8         | 0.5                    |

| Gittins Index -- Whittle's Approximation (β=0.70)           | 47.2         | 0.27                   |

| BayesUCB (δ=0.500)                                          | 47.4         | 0.29                   |

| Gittins Index -- Whittle's Approximation (β=0.50)           | 48.4         | 0.25                   |

| TS-UCB (1 samples)                                          | 48.8         | 1.15                   |

| ϵ-Greedy (ϵ=0.050)                                          | 49.0         | 0.18                   |

| POKER (H=25)                                                | 49.8         | 0.38                   |

| MOSS-Anytime (α=-0.33)                                      | 52.2         | 0.32                   |

| ϵ-Decreasing (ϵ=0.500)                                      | 52.2         | 0.2                    |

| ϵ-Exploring Thompson Sampling                               | 52.4         | 0.23                   |

| IRS.FH (H=10)                                               | 52.6         | 1.76                   |

| POKER (H=50)                                                | 54.0         | 0.38                   |

| POKER (H=100)                                               | 55.8         | 0.4                    |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.001)                        | 56.6         | 0.23                   |

| RAVEN-UCB (a0=1, b0=5, eps=0.5)                             | 58.8         | 0.24                   |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.9)     | 59.4         | 0.48                   |

| RAVEN-UCB (a0=1, b0=5, eps=0.1)                             | 61.0         | 0.24                   |

| ϵ-Greedy (ϵ=0.100)                                          | 61.2         | 0.17                   |

| WR-SDA (forced_exploration=true)                            | 62.0         | 0.82                   |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.8)     | 62.2         | 0.45                   |

| UCBT                                                        | 64.0         | 0.15                   |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.95)    | 64.6         | 0.51                   |

| BayesUCB (δ=0.100)                                          | 65.6         | 0.26                   |

| IRS.FH (H=25)                                               | 65.8         | 2.14                   |

| Forced Exploration                                          | 67.8         | 0.16                   |

| RAVEN-UCB (a0=1, b0=5, eps=0.001)                           | 68.0         | 0.24                   |

| BayesUCB (δ=0.900)                                          | 68.2         | 0.3                    |

| WR-SDA (forced_exploration=false)                           | 68.8         | 0.79                   |

| SoftElim (θ=0.25)                                           | 70.8         | 0.5                    |

| ReBoot (r=0.25)                                             | 71.0         | 0.27                   |

| MARS (δ=0.100)                                              | 72.2         | 0.45                   |

| POKER (H=250)                                               | 74.4         | 0.4                    |

| Batch Ensemble for MAB (m=2)                                | 75.0         | 0.2                    |

| Thompson Sampling                                           | 77.8         | 0.86                   |

| Weighted Bootstrap                                          | 78.6         | 3.73                   |

| Satisficing Thompson Sampling (ϵ=0.005)                     | 79.2         | 1.19                   |

| ReBoot (r=0.50)                                             | 79.8         | 0.33                   |

| Satisficing Thompson Sampling (ϵ=0.010)                     | 80.4         | 1.18                   |

| Bootstrapped Thompson Sampling (J=10)                       | 81.4         | 0.52                   |

| KL-UCB                                                      | 82.2         | 10.2                   |

| Hellinger-UCB                                               | 82.6         | 2.98                   |

| Vanilla Residual Bootstrap (init=0)                         | 83.0         | 0.22                   |

| Bootstrapped Thompson Sampling (J=500)                      | 84.2         | 6.06                   |

| Bootstrapped Thompson Sampling (J=1000)                     | 85.4         | 12.09                  |

| Bootstrapped Thompson Sampling (J=100)                      | 85.6         | 1.53                   |

| Non-Parametric Thompson Sampling                            | 85.8         | 4.7                    |

| Multiplier Bootstrap-based Exploration                      | 85.8         | 7.29                   |

| CODE (δ=0.900)                                              | 89.4         | 0.5                    |

| UCB1-Tuned                                                  | 89.6         | 0.4                    |

| Garbage In, Reward Out (a=0.10)                             | 90.6         | 1.22                   |

| Bounded Dirichlet Sampling                                  | 94.2         | 2.47                   |

| Satisficing Thompson Sampling (ϵ=0.050)                     | 94.2         | 1.13                   |

| Vanilla Residual Bootstrap (init=1)                         | 95.0         | 0.28                   |

| MARS (δ=0.010)                                              | 95.6         | 2.52                   |

| ϵ-Decreasing (ϵ=0.200)                                      | 96.4         | 0.18                   |

| SoftElim (θ=0.50)                                           | 96.6         | 0.5                    |

| Kullback-Leibler Maillard Sampling                          | 97.4         | 0.67                   |

| Perturbed-History Exploration (a=1.1)                       | 100.0        | 1.2                    |

| RS (a=0.50)                                                 | 101.0        | 0.24                   |

| RS (a=0.65)                                                 | 101.2        | 0.26                   |

| EB-TCI                                                      | 101.2        | 0.42                   |

| Softsatisficing (a=0.65)                                    | 106.0        | 0.41                   |

| Batch Ensemble for MAB (m=4)                                | 106.0        | 0.29                   |

| Tsallis-INF                                                 | 107.8        | 1.29                   |

| Batch Ensemble for MAB (m=8)                                | 109.0        | 0.38                   |

| ReBoot (r=0.90)                                             | 109.2        | 0.32                   |

| MARS (δ=0.001)                                              | 109.4        | 17.73                  |

| MARS (δ=0.002)                                              | 109.6        | 10.25                  |

| Garbage In, Reward Out (a=0.33)                             | 110.2        | 1.56                   |

| MARS (δ=1.000)                                              | 111.4        | 0.13                   |

| RS (a=0.90)                                                 | 112.0        | 0.27                   |

| Satisficing Thompson Sampling (ϵ=0.100)                     | 112.4        | 1.14                   |

| ETC (m=10)                                                  | 112.4        | 0.2                    |

| lil' UCB (δ=0.100)                                          | 112.6        | 0.39                   |

| RS (a=0.75)                                                 | 114.0        | 0.26                   |

| FTPL-GR (lr=1.000)                                          | 114.6        | 6.44                   |

| Vanilla Residual Bootstrap (init=5)                         | 115.2        | 0.27                   |

| ReBoot (r=1.00)                                             | 115.8        | 0.31                   |

| SoftElim (θ=1.00)                                           | 117.2        | 0.51                   |

| Perturbed-History Exploration (a=2.1)                       | 119.6        | 1.43                   |

| Softsatisficing (a=0.75)                                    | 120.4        | 0.47                   |

| Softsatisficing (a=0.90)                                    | 122.6        | 0.51                   |

| ETC (m=20)                                                  | 123.8        | 0.2                    |

| FTPL-GR (lr=0.100)                                          | 124.0        | 5.1                    |

| ϵ-Decreasing (ϵ=0.100)                                      | 124.0        | 0.13                   |

| lil' UCB (δ=0.010)                                          | 126.6        | 0.38                   |

| Gradient Bandit                                             | 126.8        | 0.54                   |

| ETC (m=5)                                                   | 127.4        | 0.21                   |

| Garbage In, Reward Out (a=1.00)                             | 127.6        | 1.58                   |

| Gradient Bandit (with baseline)                             | 128.2        | 0.57                   |

| Boltzmann-Gumbel Exploration                                | 128.4        | 0.49                   |

| Softsatisficing (a=0.50)                                    | 129.2        | 0.33                   |

| ETC (m=25)                                                  | 129.8        | 0.21                   |

| ReBoot (r=1.50)                                             | 130.2        | 0.31                   |

| RS (a=0.25)                                                 | 131.2        | 0.24                   |

| RS (a=0.99)                                                 | 131.6        | 0.26                   |

| lil' UCB (δ=0.001)                                          | 133.0        | 0.37                   |

| SoftElim (θ=2.00)                                           | 134.2        | 0.51                   |

| ReBoot (r=1.70)                                             | 134.2        | 0.31                   |

| Softsatisficing (a=0.25)                                    | 134.2        | 0.2                    |

| RS (a=0.10)                                                 | 134.2        | 0.24                   |

| Softsatisficing (a=0.99)                                    | 134.6        | 0.61                   |

| Least Failures                                              | 135.2        | 0.13                   |

| Perturbed-History Exploration (a=5.1)                       | 137.0        | 1.45                   |

| ReBoot (r=2.10)                                             | 140.8        | 0.31                   |

| UCB1                                                        | 141.0        | 0.26                   |

| EXP-IX                                                      | 141.4        | 0.54                   |

| ETC (m=3)                                                   | 142.4        | 0.22                   |

| Softsatisficing (a=0.10)                                    | 143.0        | 0.07                   |

| RAVEN-UCB (a0=5, b0=1, eps=0.5)                             | 145.0        | 0.23                   |

| RAVEN-UCB (a0=5, b0=1, eps=0.1)                             | 145.6        | 0.23                   |

| RAVEN-UCB (a0=5, b0=1, eps=0.001)                           | 147.0        | 0.24                   |

| ETC (m=2)                                                   | 147.6        | 0.17                   |

| SoftElim (θ=5.00)                                           | 150.2        | 0.53                   |

| RS (a=0.00)                                                 | 150.4        | 0.19                   |

| FTPL-GR (lr=0.010)                                          | 154.4        | 5.32                   |

| FTPL-GR (lr=0.001)                                          | 160.6        | 5.28                   |

| CODE (δ=0.050)                                              | 161.8        | 0.49                   |

| Random                                                      | 161.8        | 0.03                   |

## Data

### Uniform

This experiment uses 10 arms, with the means sampled uniformly from the interval

[0, 1]. This is a relatively easy instance, because there is likely to be a

single best arm that is easy to find. This is reflected in the %-Optimal column,

where the best algorithms reach over 2/3 pull rate of the optimal arm.

Results

| Algorithm                                                   | %-Optimal | Regret (Mean) | Regret (Median Absolute Deviation) |  Time  |

| ----------------------------------------------------------- | --------: | ------------: | ---------------------------------: | :----: |

| Gittins Index -- Whittle's Approximation (β=0.99)           |     65.59 |       16.1524 |                             6.1313 | 0.28s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.99)    |     68.73 |       16.2172 |                             3.6835 | 0.54s  |

| BayesUCB (δ=0.300)                                          |     68.64 |       16.4619 |                             5.4296 | 0.26s  |

| Batch Ensemble for MAB (m=0)                                |     71.10 |       16.5066 |                             2.6067 | 0.09s  |

| Batch Ensemble for MAB (m=1)                                |     70.97 |       16.7903 |                             2.9368 | 0.10s  |

| TS-UCB (100 samples)                                        |     71.96 |       16.8902 |                             3.3855 | 92.62s |

| IRS.FH (H=2)                                                |     71.14 |       16.9375 |                             2.7279 | 1.83s  |

| IRS.FH (H=3)                                                |     72.29 |       16.9395 |                             3.0432 | 1.87s  |

| BayesUCB (δ=0.400)                                          |     64.26 |       17.1313 |                             6.9164 | 0.26s  |

| TS-UCB (10 samples)                                         |     72.56 |       17.2502 |                             3.7256 | 8.52s  |

| Gittins Index -- Whittle's Approximation (β=0.90)           |     61.89 |       17.4329 |                             6.6927 | 0.25s  |

| BayesUCB (δ=0.200)                                          |     71.21 |       17.5846 |                             4.0029 | 0.25s  |

| IRS.FH (H=4)                                                |     72.57 |       17.6097 |                             3.5134 | 1.88s  |

| ϵ-Exploring TS-UCB (100 samples)                            |     69.94 |       17.8335 |                             2.8808 | 8.21s  |

| ϵ-Exploring TS-UCB (10 samples)                             |     70.16 |       17.9184 |                             2.9572 | 1.01s  |

| ϵ-Exploring TS-UCB (1 samples)                              |     70.36 |       18.0420 |                             3.0933 | 0.19s  |

| IRS.FH (H=1)                                                |     68.89 |       18.0936 |                             2.5509 | 1.79s  |

| UCB-DT (γ=1.00)                                             |     69.93 |       18.1466 |                             2.5287 | 3.05s  |

| UCB-DT (γ=0.95)                                             |     72.44 |       18.1946 |                             2.4725 | 3.50s  |

| UCB-DT (γ=0.75)                                             |     72.50 |       18.1962 |                             2.5172 | 2.92s  |

| UCB-DT (γ=0.90)                                             |     72.42 |       18.2016 |                             2.4807 | 3.41s  |

| IRS.FH (H=5)                                                |     72.59 |       18.3624 |                             4.0320 | 1.82s  |

| MOSS-Anytime (α=-0.85)                                      |     69.71 |       18.8113 |                             2.5659 | 0.23s  |

| SoftElim (θ=0.01)                                           |     68.33 |       18.8211 |                             2.5745 | 0.44s  |

| CODE (δ=0.990)                                              |     68.91 |       18.9329 |                             2.9569 | 0.42s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.5)                         |     70.40 |       19.0812 |                             2.7529 | 0.28s  |

| POKER (H=10)                                                |     65.48 |       19.3526 |                             3.2000 | 0.38s  |

| POKER (H=5)                                                 |     66.34 |       19.3558 |                             2.7035 | 0.36s  |

| ReUCB (a=2.00)                                              |     68.28 |       19.4080 |                             2.6887 | 1.02s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.1)                         |     70.19 |       19.4646 |                             2.7603 | 0.34s  |

| TS-UCB (1 samples)                                          |     71.83 |       19.5545 |                             5.3564 | 1.17s  |

| ReUCB (a=1.50)                                              |     67.94 |       19.5633 |                             2.7023 | 1.08s  |

| POKER (H=1)                                                 |     66.55 |       19.5748 |                             2.5553 | 0.35s  |

| ReUCB (a=1.00)                                              |     67.60 |       19.7102 |                             2.7106 | 1.14s  |

| Greedy                                                      |     66.26 |       19.7129 |                             2.5470 | 0.10s  |

| BayesUCB (δ=0.500)                                          |     58.32 |       20.1518 |                             9.3444 | 0.29s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.001)                       |     69.57 |       20.5446 |                             2.8610 | 0.21s  |

| ϵ-Decreasing (ϵ=0.990)                                      |     66.35 |       20.7765 |                             2.7735 | 0.21s  |

| SoftElim (θ=0.10)                                           |     67.97 |       20.9660 |                             3.2737 | 0.50s  |

| BayesUCB (δ=0.100)                                          |     68.95 |       20.9900 |                             3.7246 | 0.18s  |

| Thompson Sampling with Virtual Helping Agents (Combiner C3) |     63.16 |       21.1041 |                             6.1932 | 32.19s |

| ϵ-Greedy (ϵ=0.010)                                          |     66.18 |       21.1769 |                             2.8588 | 0.11s  |

| IRS.FH (H=10)                                               |     71.83 |       21.2519 |                             5.4688 | 1.92s  |

| ϵ-Decreasing (ϵ=0.900)                                      |     66.48 |       21.2824 |                             2.8492 | 0.21s  |

| POKER (H=25)                                                |     61.38 |       21.3386 |                             6.6186 | 0.36s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.5)                          |     68.76 |       22.2566 |                             2.9656 | 0.26s  |

| MOSS-Anytime (α=-0.50)                                      |     70.74 |       22.4582 |                             2.7088 | 0.24s  |

| ϵ-Greedy (ϵ=0.020)                                          |     65.99 |       22.7752 |                             3.1672 | 0.14s  |

| RS (a=0.75)                                                 |     57.47 |       23.4688 |                            11.0929 | 0.25s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.1)                          |     68.14 |       23.4949 |                             3.2027 | 0.25s  |

| ϵ-Decreasing (ϵ=0.700)                                      |     66.55 |       23.6847 |                             3.3687 | 0.19s  |

| WR-SDA (forced_exploration=false)                           |     66.87 |       23.8280 |                             5.0922 | 0.67s  |

| WR-SDA (forced_exploration=true)                            |     69.42 |       24.2844 |                             5.2129 | 0.64s  |

| IRS.FH (H=25)                                               |     70.09 |       24.4452 |                             6.0015 | 2.29s  |

| MOSS-Anytime (α=-0.33)                                      |     69.75 |       24.4536 |                             2.6909 | 0.23s  |

| Gittins Index -- Whittle's Approximation (β=0.70)           |     51.65 |       24.9930 |                            11.3705 | 0.24s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.5)                             |     65.05 |       25.4596 |                             3.6551 | 0.30s  |

| POKER (H=50)                                                |     56.21 |       25.6788 |                             9.6913 | 0.38s  |

| SoftElim (θ=0.25)                                           |     64.43 |       26.0903 |                             4.2092 | 0.49s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.1)                             |     64.72 |       26.1332 |                             3.7397 | 0.26s  |

| ϵ-Greedy (ϵ=0.050)                                          |     65.45 |       27.3929 |                             4.0210 | 0.15s  |

| ϵ-Exploring Thompson Sampling                               |     62.82 |       27.9018 |                             9.2377 | 0.21s  |

| UCBT                                                        |     65.40 |       28.7984 |                             4.0759 | 0.11s  |

| Thompson Sampling                                           |     66.16 |       28.8956 |                             7.1444 | 0.85s  |

| Weighted Bootstrap                                          |     66.08 |       28.9034 |                             7.1124 | 3.56s  |

| Satisficing Thompson Sampling (ϵ=0.005)                     |     65.94 |       29.0318 |                             7.1008 | 1.08s  |

| Satisficing Thompson Sampling (ϵ=0.010)                     |     65.61 |       29.3229 |                             7.0179 | 1.11s  |

| KL-UCB                                                      |     66.78 |       29.6304 |                             7.3837 | 8.55s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.001)                        |     65.19 |       29.6926 |                             4.6441 | 0.26s  |

| ReBoot (r=0.25)                                             |     61.18 |       30.3599 |                             5.2731 | 0.26s  |

| Softsatisficing (a=0.75)                                    |     49.75 |       30.3853 |                            17.8608 | 0.14s  |

| Gittins Index -- Whittle's Approximation (β=0.50)           |     46.19 |       30.4565 |                            14.2565 | 0.22s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.001)                           |     62.48 |       30.6351 |                             4.5496 | 0.26s  |

| CODE (δ=0.900)                                              |     54.94 |       30.6423 |                             6.5536 | 0.44s  |

| Batch Ensemble for MAB (m=2)                                |     62.49 |       30.7481 |                             7.8970 | 0.17s  |

| POKER (H=100)                                               |     51.57 |       30.8991 |                            12.6895 | 0.36s  |

| Hellinger-UCB                                               |     63.89 |       31.0005 |                             7.0702 | 2.64s  |

| ϵ-Decreasing (ϵ=0.500)                                      |     65.55 |       31.3306 |                             4.6232 | 0.18s  |

| UCB1-Tuned                                                  |     62.03 |       31.6747 |                             3.6906 | 0.35s  |

| Vanilla Residual Bootstrap (init=0)                         |     59.99 |       33.1442 |                             5.4073 | 0.21s  |

| Non-Parametric Thompson Sampling                            |     63.70 |       33.7962 |                             7.1820 | 4.46s  |

| RS (a=0.65)                                                 |     44.01 |       33.8259 |                            18.1643 | 0.25s  |

| ReBoot (r=0.50)                                             |     58.58 |       34.0829 |                             5.9224 | 0.31s  |

| Bounded Dirichlet Sampling                                  |     63.86 |       34.1647 |                             7.1345 | 2.13s  |

| MARS (δ=0.100)                                              |     64.47 |       34.5294 |                             4.8042 | 0.45s  |

| Satisficing Thompson Sampling (ϵ=0.050)                     |     57.19 |       35.0506 |                             6.7983 | 1.12s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.95)    |     42.16 |       35.0542 |                            18.6679 | 0.57s  |

| ϵ-Greedy (ϵ=0.100)                                          |     63.98 |       35.8380 |                             5.3322 | 0.16s  |

| Multiplier Bootstrap-based Exploration                      |     60.70 |       36.1612 |                             4.2418 | 7.07s  |

| SoftElim (θ=0.50)                                           |     57.22 |       36.3273 |                             4.6857 | 0.51s  |

| Kullback-Leibler Maillard Sampling                          |     59.67 |       37.5162 |                             8.3979 | 0.60s  |

| Perturbed-History Exploration (a=1.1)                       |     56.96 |       37.8929 |                             5.6711 | 0.84s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.9)     |     40.26 |       38.2054 |                            20.8866 | 0.47s  |

| POKER (H=250)                                               |     46.27 |       38.6838 |                            15.5508 | 0.36s  |

| Garbage In, Reward Out (a=0.10)                             |     57.65 |       38.7302 |                             5.2772 | 1.08s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.8)     |     39.50 |       39.7859 |                            22.1131 | 0.44s  |

| Softsatisficing (a=0.65)                                    |     40.21 |       39.8959 |                            24.8725 | 0.11s  |

| BayesUCB (δ=0.900)                                          |     39.28 |       40.0985 |                            22.3598 | 0.28s  |

| FTPL-GR (lr=1.000)                                          |     58.80 |       40.3823 |                            10.4546 | 7.14s  |

| Vanilla Residual Bootstrap (init=1)                         |     59.43 |       40.6304 |                             4.7837 | 0.27s  |

| Bootstrapped Thompson Sampling (J=500)                      |     40.59 |       41.9370 |                            21.7066 | 5.70s  |

| Bootstrapped Thompson Sampling (J=1000)                     |     40.88 |       41.9668 |                            21.1936 | 11.62s |

| Bootstrapped Thompson Sampling (J=100)                      |     40.77 |       42.3584 |                            21.7453 | 1.40s  |

| Bootstrapped Thompson Sampling (J=10)                       |     39.55 |       42.8224 |                            21.8677 | 0.44s  |

| RS (a=0.90)                                                 |     61.99 |       43.3480 |                            18.0732 | 0.26s  |

| Satisficing Thompson Sampling (ϵ=0.100)                     |     44.13 |       44.2992 |                            10.4673 | 1.15s  |

| lil' UCB (δ=0.100)                                          |     52.19 |       44.8365 |                             5.5606 | 0.34s  |

| Tsallis-INF                                                 |     54.25 |       46.4787 |                             5.9697 | 1.30s  |

| Softsatisficing (a=0.90)                                    |     56.49 |       46.6220 |                            27.5781 | 0.32s  |

| Forced Exploration                                          |     62.89 |       46.6666 |                             6.2607 | 0.16s  |

| ReBoot (r=0.90)                                             |     52.24 |       47.2795 |                             6.7367 | 0.31s  |

| Garbage In, Reward Out (a=0.33)                             |     51.74 |       49.2706 |                             5.5459 | 1.47s  |

| Vanilla Residual Bootstrap (init=5)                         |     55.69 |       50.7442 |                             6.1208 | 0.26s  |

| SoftElim (θ=1.00)                                           |     47.68 |       51.6367 |                             6.0049 | 0.51s  |

| ReBoot (r=1.00)                                             |     49.90 |       51.8800 |                             6.7533 | 0.30s  |

| MARS (δ=0.010)                                              |     54.83 |       53.5390 |                             6.1628 | 2.49s  |

| Batch Ensemble for MAB (m=4)                                |     48.14 |       53.9552 |                            22.0397 | 0.24s  |

| EB-TCI                                                      |     42.82 |       55.0174 |                            15.7714 | 0.39s  |

| MARS (δ=0.001)                                              |     50.10 |       55.4818 |                            10.4818 | 16.94s |

| RS (a=0.50)                                                 |     32.37 |       55.6085 |                            36.1138 | 0.24s  |

| Perturbed-History Exploration (a=2.1)                       |     47.44 |       56.5448 |                             6.0521 | 1.04s  |

| ETC (m=10)                                                  |     47.32 |       56.6956 |                            11.0554 | 0.18s  |

| FTPL-GR (lr=0.100)                                          |     50.69 |       57.1522 |                             7.0720 | 4.92s  |

| MARS (δ=0.002)                                              |     50.18 |       59.5941 |                             8.8231 | 9.32s  |

| lil' UCB (δ=0.010)                                          |     44.08 |       62.1486 |                             6.5312 | 0.33s  |

| Softsatisficing (a=0.50)                                    |     28.97 |       63.5094 |                            43.9773 | 0.09s  |

| MARS (δ=1.000)                                              |     37.18 |       65.5059 |                            21.5650 | 0.11s  |

| Garbage In, Reward Out (a=1.00)                             |     43.03 |       66.4802 |                             6.9482 | 1.54s  |

| Boltzmann-Gumbel Exploration                                |     43.87 |       68.9250 |                             6.5817 | 0.44s  |

| Gradient Bandit                                             |     46.48 |       69.6675 |                             9.5534 | 0.48s  |

| Gradient Bandit (with baseline)                             |     48.72 |       70.6839 |                             6.1066 | 0.52s  |

| SoftElim (θ=2.00)                                           |     37.94 |       71.6155 |                             8.1877 | 0.51s  |

| ReBoot (r=1.50)                                             |     40.44 |       72.1794 |                             8.1305 | 0.31s  |

| lil' UCB (δ=0.001)                                          |     39.18 |       73.8291 |                             8.0325 | 0.35s  |

| ETC (m=5)                                                   |     27.93 |       78.7963 |                            24.1796 | 0.19s  |

| ReBoot (r=1.70)                                             |     37.41 |       79.4522 |                             8.9230 | 0.30s  |

| Batch Ensemble for MAB (m=8)                                |     34.92 |       80.6517 |                            46.0772 | 0.30s  |

| ϵ-Decreasing (ϵ=0.200)                                      |     50.82 |       81.7548 |                            11.1762 | 0.14s  |

| Perturbed-History Exploration (a=5.1)                       |     36.06 |       83.3539 |                             9.5119 | 1.24s  |

| RS (a=0.99)                                                 |     42.93 |       84.4523 |                            29.9659 | 0.27s  |

| ETC (m=20)                                                  |     49.52 |       85.1694 |                            11.9964 | 0.18s  |

| UCB1                                                        |     34.52 |       86.8474 |                            10.2054 | 0.18s  |

| Least Failures                                              |     40.55 |       88.7625 |                            28.1293 | 0.10s  |

| Softsatisficing (a=0.99)                                    |     39.79 |       89.3024 |                            27.8456 | 0.52s  |

| ReBoot (r=2.10)                                             |     32.31 |       92.8131 |                            10.7156 | 0.30s  |

| EXP-IX                                                      |     31.87 |       95.7830 |                            13.0250 | 0.53s  |

| ETC (m=3)                                                   |     22.30 |       98.5252 |                            27.0722 | 0.19s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.5)                             |     28.73 |      104.0006 |                            13.6507 | 0.26s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.1)                             |     28.72 |      104.0303 |                            13.6481 | 0.27s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.001)                           |     28.63 |      104.3277 |                            13.7403 | 0.26s  |

| RS (a=0.25)                                                 |     19.47 |      104.5549 |                            74.4142 | 0.24s  |

| ETC (m=25)                                                  |     41.95 |      105.2629 |                            14.8396 | 0.18s  |

| SoftElim (θ=5.00)                                           |     26.15 |      105.3719 |                            13.8624 | 0.54s  |

| ETC (m=2)                                                   |     20.21 |      110.5641 |                            26.8868 | 0.14s  |

| Softsatisficing (a=0.25)                                    |     18.13 |      114.1714 |                            81.1272 | 0.08s  |

| ϵ-Decreasing (ϵ=0.100)                                      |     35.59 |      127.2145 |                            17.7947 | 0.09s  |

| RS (a=0.10)                                                 |     16.10 |      132.1474 |                            91.9926 | 0.24s  |

| Softsatisficing (a=0.10)                                    |     15.69 |      135.0154 |                            93.5393 | 0.06s  |

| FTPL-GR (lr=0.010)                                          |     20.49 |      142.8645 |                            20.4026 | 4.91s  |

| RS (a=0.00)                                                 |     15.09 |      144.1846 |                            97.5086 | 0.20s  |

| CODE (δ=0.050)                                              |     10.94 |      187.9726 |                            24.8420 | 0.47s  |

| FTPL-GR (lr=0.001)                                          |     10.91 |      196.2329 |                            29.3352 | 5.24s  |

| Random                                                      |     10.01 |      204.0160 |                            30.3495 | 0.02s  |

### Half-Range

This experiment uses 10 arms, with the means sampled uniformly from the interval

[0.25, 0.75]. This is a harder instance, because the arms are closer together

and thus harder to distinguish.

This experiment was taken from the GIRO paper.

Results

| Algorithm                                                   | %-Optimal | Regret (Mean) | Regret (Median Absolute Deviation) |  Time  |

| ----------------------------------------------------------- | --------: | ------------: | ---------------------------------: | :----: |

| Batch Ensemble for MAB (m=0)                                |     44.01 |       24.3162 |                             7.5674 | 0.13s  |

| IRS.FH (H=2)                                                |     43.85 |       24.7422 |                             7.8222 | 1.49s  |

| IRS.FH (H=3)                                                |     45.28 |       25.0047 |                             6.6326 | 1.63s  |

| Gittins Index -- Whittle's Approximation (β=0.99)           |     44.09 |       25.0082 |                             7.8533 | 0.26s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.9)     |     40.05 |       25.2717 |                            12.2698 | 0.47s  |

| UCB-DT (γ=0.90)                                             |     43.02 |       25.6120 |                             7.2004 | 3.51s  |

| UCB-DT (γ=0.95)                                             |     43.00 |       25.6319 |                             7.1816 | 3.47s  |

| UCB-DT (γ=0.75)                                             |     43.05 |       25.6700 |                             7.2075 | 3.02s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.99)    |     41.68 |       25.7326 |                             9.2085 | 0.49s  |

| Gittins Index -- Whittle's Approximation (β=0.90)           |     41.53 |       25.7390 |                             9.3584 | 0.26s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.95)    |     40.51 |       25.7582 |                            11.6365 | 0.49s  |

| Gittins Index -- Whittle's Approximation (β=0.50)           |     40.01 |       25.7986 |                            11.1271 | 0.22s  |

| BayesUCB (δ=0.900)                                          |     39.16 |       25.8336 |                            13.1050 | 0.26s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.8)     |     38.83 |       26.1183 |                            12.9800 | 0.44s  |

| BayesUCB (δ=0.500)                                          |     42.67 |       26.1689 |                             8.4137 | 0.26s  |

| Gittins Index -- Whittle's Approximation (β=0.70)           |     40.02 |       26.4219 |                            10.1272 | 0.25s  |

| BayesUCB (δ=0.400)                                          |     43.04 |       26.4233 |                             8.1311 | 0.27s  |

| BayesUCB (δ=0.300)                                          |     44.61 |       26.4883 |                             7.0859 | 0.26s  |

| IRS.FH (H=4)                                                |     44.59 |       26.4971 |                             6.7171 | 1.78s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.001)                       |     42.52 |       26.5203 |                             7.2100 | 0.20s  |

| Batch Ensemble for MAB (m=1)                                |     43.55 |       26.6555 |                             8.8838 | 0.15s  |

| Thompson Sampling with Virtual Helping Agents (Combiner C3) |     44.11 |       26.7250 |                             8.7506 | 14.48s |

| IRS.FH (H=1)                                                |     39.73 |       26.9134 |                             9.8115 | 1.44s  |

| TS-UCB (100 samples)                                        |     45.02 |       26.9156 |                             6.2201 | 93.54s |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.1)                         |     41.15 |       27.0983 |                             8.1856 | 0.26s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.5)                          |     42.23 |       27.1170 |                             6.8959 | 0.23s  |

| RS (a=0.65)                                                 |     44.47 |       27.1789 |                            13.8551 | 0.26s  |

| BayesUCB (δ=0.200)                                          |     45.32 |       27.2126 |                             6.1882 | 0.26s  |

| ϵ-Exploring TS-UCB (1 samples)                              |     42.08 |       27.2128 |                             8.1379 | 0.24s  |

| MOSS-Anytime (α=-0.85)                                      |     40.04 |       27.3181 |                             8.7262 | 0.22s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.5)                         |     40.90 |       27.4143 |                             8.4479 | 0.26s  |

| MOSS-Anytime (α=-0.50)                                      |     44.05 |       27.4891 |                             5.4358 | 0.28s  |

| ϵ-Exploring TS-UCB (10 samples)                             |     41.55 |       27.5450 |                             8.1473 | 1.16s  |

| ϵ-Exploring TS-UCB (100 samples)                            |     41.08 |       27.6722 |                             8.2739 | 8.72s  |

| CODE (δ=0.990)                                              |     39.41 |       27.7728 |                            10.1499 | 0.46s  |

| TS-UCB (10 samples)                                         |     44.54 |       27.9224 |                             5.9276 | 10.66s |

| SoftElim (θ=0.01)                                           |     38.57 |       27.9881 |                             9.7007 | 0.44s  |

| UCB-DT (γ=1.00)                                             |     38.52 |       28.0522 |                             9.8213 | 2.94s  |

| IRS.FH (H=5)                                                |     43.60 |       28.0814 |                             6.8634 | 1.67s  |

| ϵ-Decreasing (ϵ=0.990)                                      |     38.24 |       28.1487 |                             9.6145 | 0.18s  |

| Greedy                                                      |     37.83 |       28.2076 |                             9.9996 | 0.11s  |

| Softsatisficing (a=0.65)                                    |     41.45 |       28.2436 |                            15.4217 | 0.22s  |

| ϵ-Decreasing (ϵ=0.900)                                      |     38.32 |       28.3069 |                             9.4761 | 0.19s  |

| SoftElim (θ=0.10)                                           |     40.87 |       28.3070 |                             8.5269 | 0.49s  |

| ϵ-Decreasing (ϵ=0.700)                                      |     39.35 |       28.3077 |                             8.7988 | 0.19s  |

| POKER (H=1)                                                 |     37.76 |       28.3667 |                            10.1082 | 0.35s  |

| POKER (H=5)                                                 |     37.76 |       28.3800 |                            10.0953 | 0.39s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.1)                          |     41.48 |       28.3881 |                             6.9847 | 0.23s  |

| POKER (H=10)                                                |     37.74 |       28.4050 |                            10.0473 | 0.38s  |

| ϵ-Greedy (ϵ=0.010)                                          |     38.03 |       28.4793 |                             9.7905 | 0.11s  |

| ReUCB (a=2.00)                                              |     38.15 |       28.5063 |                             9.6122 | 1.11s  |

| ReUCB (a=1.50)                                              |     38.07 |       28.5077 |                             9.7464 | 1.32s  |

| ReUCB (a=1.00)                                              |     37.96 |       28.5231 |                             9.8749 | 1.18s  |

| ϵ-Greedy (ϵ=0.020)                                          |     38.36 |       28.6900 |                             9.4808 | 0.15s  |

| POKER (H=25)                                                |     37.49 |       28.8412 |                             9.4550 | 0.40s  |

| ϵ-Greedy (ϵ=0.050)                                          |     39.46 |       29.3486 |                             8.7084 | 0.16s  |

| Bootstrapped Thompson Sampling (J=10)                       |     38.57 |       29.4073 |                            13.9756 | 0.53s  |

| ϵ-Decreasing (ϵ=0.500)                                      |     40.91 |       29.4333 |                             7.5048 | 0.18s  |

| MOSS-Anytime (α=-0.33)                                      |     42.29 |       29.8866 |                             5.9957 | 0.29s  |

| POKER (H=100)                                               |     38.92 |       29.9131 |                             6.5647 | 0.40s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.5)                             |     41.51 |       30.0406 |                             6.8188 | 0.26s  |

| BayesUCB (δ=0.100)                                          |     42.74 |       30.3237 |                             6.0061 | 0.20s  |

| POKER (H=50)                                                |     36.98 |       30.4262 |                             8.3416 | 0.39s  |

| RS (a=0.50)                                                 |     34.65 |       30.4958 |                            17.8406 | 0.24s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.1)                             |     41.13 |       30.6400 |                             6.5227 | 0.27s  |

| ϵ-Exploring Thompson Sampling                               |     40.14 |       30.7659 |                             8.9988 | 0.21s  |

| Bootstrapped Thompson Sampling (J=500)                      |     38.36 |       30.8943 |                            13.6813 | 6.05s  |

| Bootstrapped Thompson Sampling (J=100)                      |     38.23 |       30.9704 |                            13.6387 | 1.64s  |

| Bootstrapped Thompson Sampling (J=1000)                     |     37.93 |       31.2238 |                            13.7505 | 11.09s |

| SoftElim (θ=0.25)                                           |     41.35 |       31.4556 |                             6.6870 | 0.53s  |

| ϵ-Greedy (ϵ=0.100)                                          |     40.16 |       31.5381 |                             7.6639 | 0.16s  |

| TS-UCB (1 samples)                                          |     41.21 |       31.8313 |                             6.2230 | 1.10s  |

| IRS.FH (H=10)                                               |     41.13 |       32.0134 |                             6.6906 | 1.75s  |

| UCBT                                                        |     41.92 |       32.0754 |                             5.3843 | 0.13s  |

| MARS (δ=0.100)                                              |     40.72 |       32.4299 |                             6.8526 | 0.44s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.001)                        |     39.29 |       32.7851 |                             6.8408 | 0.23s  |

| Forced Exploration                                          |     41.72 |       33.1699 |                             5.7046 | 0.16s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.001)                           |     39.57 |       33.2162 |                             6.5485 | 0.26s  |

| Softsatisficing (a=0.50)                                    |     31.09 |       33.7174 |                            20.6161 | 0.11s  |

| WR-SDA (forced_exploration=true)                            |     40.23 |       33.8608 |                             6.5333 | 1.04s  |

| POKER (H=250)                                               |     37.22 |       33.9079 |                             8.0820 | 0.44s  |

| WR-SDA (forced_exploration=false)                           |     37.74 |       34.3702 |                             7.8470 | 0.99s  |

| IRS.FH (H=25)                                               |     38.87 |       35.4924 |                             6.7519 | 2.17s  |

| CODE (δ=0.900)                                              |     35.87 |       35.7202 |                            11.4984 | 0.46s  |

| UCB1-Tuned                                                  |     38.36 |       36.0304 |                             5.8517 | 0.31s  |

| ReBoot (r=0.25)                                             |     35.81 |       36.8892 |                             8.1828 | 0.27s  |

| Vanilla Residual Bootstrap (init=0)                         |     35.10 |       38.0391 |                             7.9288 | 0.22s  |

| Multiplier Bootstrap-based Exploration                      |     36.05 |       38.7066 |                             7.0003 | 6.94s  |

| SoftElim (θ=0.50)                                           |     35.14 |       39.4061 |                             7.2049 | 0.53s  |

| ReBoot (r=0.50)                                             |     34.21 |       39.5480 |                             8.2009 | 0.32s  |

| ETC (m=10)                                                  |     33.45 |       40.0881 |                            11.7950 | 0.18s  |

| Hellinger-UCB                                               |     36.12 |       40.4295 |                             6.1041 | 2.95s  |

| Weighted Bootstrap                                          |     35.00 |       40.5410 |                             7.4857 | 3.56s  |

| Thompson Sampling                                           |     35.01 |       40.5420 |                             7.5125 | 0.87s  |

| Satisficing Thompson Sampling (ϵ=0.005)                     |     34.96 |       40.5786 |                             7.5540 | 1.28s  |

| Satisficing Thompson Sampling (ϵ=0.010)                     |     34.87 |       40.6461 |                             7.5447 | 1.24s  |

| Batch Ensemble for MAB (m=2)                                |     35.57 |       40.7384 |                             9.1522 | 0.21s  |

| Garbage In, Reward Out (a=0.10)                             |     33.73 |       42.0945 |                             7.6013 | 1.32s  |

| Perturbed-History Exploration (a=1.1)                       |     33.49 |       42.3004 |                             7.7267 | 1.26s  |

| KL-UCB                                                      |     34.54 |       42.7149 |                             6.2245 | 9.61s  |

| EB-TCI                                                      |     30.56 |       42.8317 |                             9.3319 | 0.40s  |

| Satisficing Thompson Sampling (ϵ=0.050)                     |     32.52 |       43.1108 |                             8.0902 | 1.13s  |

| Non-Parametric Thompson Sampling                            |     33.09 |       43.6865 |                             7.5605 | 4.51s  |

| Vanilla Residual Bootstrap (init=1)                         |     32.88 |       43.7710 |                             7.4509 | 0.28s  |

| MARS (δ=0.010)                                              |     33.86 |       44.2200 |                             6.6462 | 2.43s  |

| Bounded Dirichlet Sampling                                  |     32.79 |       44.7466 |                             7.9659 | 2.79s  |

| Tsallis-INF                                                 |     32.35 |       45.6862 |                             8.4068 | 1.26s  |

| lil' UCB (δ=0.100)                                          |     31.70 |       46.4287 |                             6.7023 | 0.34s  |

| MARS (δ=1.000)                                              |     27.30 |       46.6224 |                            23.1260 | 0.12s  |

| Kullback-Leibler Maillard Sampling                          |     29.69 |       47.8324 |                             8.4744 | 0.60s  |

| Satisficing Thompson Sampling (ϵ=0.100)                     |     27.45 |       48.1450 |                            10.2207 | 1.11s  |

| Garbage In, Reward Out (a=0.33)                             |     30.11 |       48.1458 |                             8.0648 | 1.50s  |

| MARS (δ=0.001)                                              |     30.62 |       48.3961 |                             7.5977 | 19.75s |

| ReBoot (r=0.90)                                             |     29.34 |       48.4181 |                             8.4845 | 0.32s  |

| ϵ-Decreasing (ϵ=0.200)                                      |     33.79 |       49.1413 |                             7.5396 | 0.15s  |

| MARS (δ=0.002)                                              |     30.50 |       49.4331 |                             7.3296 | 10.25s |

| SoftElim (θ=1.00)                                           |     28.58 |       49.5066 |                             8.4513 | 0.51s  |

| ETC (m=5)                                                   |     21.32 |       50.0278 |                            17.6885 | 0.19s  |

| ReBoot (r=1.00)                                             |     27.89 |       50.9352 |                             8.6898 | 0.30s  |

| ETC (m=20)                                                  |     31.24 |       51.1732 |                             8.6350 | 0.18s  |

| Perturbed-History Exploration (a=2.1)                       |     27.91 |       52.2188 |                             8.4423 | 1.38s  |

| Softsatisficing (a=0.75)                                    |     30.32 |       52.6410 |                            16.5585 | 0.41s  |

| FTPL-GR (lr=0.100)                                          |     28.27 |       52.7166 |                            10.1011 | 4.84s  |

| Vanilla Residual Bootstrap (init=5)                         |     28.26 |       53.2834 |                             8.4062 | 0.28s  |

| RS (a=0.75)                                                 |     29.84 |       54.6692 |                            13.4306 | 0.28s  |

| Gradient Bandit                                             |     29.48 |       54.9276 |                             8.8193 | 0.48s  |

| Gradient Bandit (with baseline)                             |     29.07 |       55.8207 |                             8.2645 | 0.51s  |

| Batch Ensemble for MAB (m=4)                                |     26.48 |       55.8781 |                            10.9358 | 0.34s  |

| ETC (m=25)                                                  |     32.18 |       56.3820 |                             8.2546 | 0.18s  |

| FTPL-GR (lr=1.000)                                          |     26.06 |       56.8336 |                            12.0625 | 5.21s  |

| lil' UCB (δ=0.010)                                          |     25.83 |       56.9410 |                             8.2814 | 0.33s  |

| Garbage In, Reward Out (a=1.00)                             |     25.12 |       57.7304 |                             9.1152 | 1.50s  |

| Boltzmann-Gumbel Exploration                                |     25.61 |       58.0539 |                             8.8928 | 0.43s  |

| ReBoot (r=1.50)                                             |     22.85 |       61.0890 |                             9.6647 | 0.31s  |

| SoftElim (θ=2.00)                                           |     22.46 |       61.3682 |                             9.9029 | 0.49s  |

| lil' UCB (δ=0.001)                                          |     22.85 |       62.7995 |                             9.1698 | 0.31s  |

| ReBoot (r=1.70)                                             |     21.38 |       64.4112 |                            10.0761 | 0.31s  |

| Perturbed-History Exploration (a=5.1)                       |     21.44 |       65.8492 |                            10.0502 | 1.46s  |

| UCB1                                                        |     20.42 |       68.0927 |                            10.1489 | 0.19s  |

| ϵ-Decreasing (ϵ=0.100)                                      |     24.60 |       68.8686 |                             9.8576 | 0.10s  |

| ReBoot (r=2.10)                                             |     19.16 |       69.7726 |                            10.8419 | 0.31s  |

| ETC (m=3)                                                   |     15.41 |       69.9994 |                            18.3348 | 0.20s  |

| EXP-IX                                                      |     19.28 |       71.2582 |                            11.2795 | 0.51s  |

| RS (a=0.25)                                                 |     15.97 |       71.7134 |                            48.5357 | 0.24s  |

| Batch Ensemble for MAB (m=8)                                |     17.28 |       74.4122 |                            19.8696 | 0.36s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.5)                             |     17.71 |       74.8132 |                            11.3884 | 0.23s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.1)                             |     17.70 |       74.8207 |                            11.3883 | 0.23s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.001)                           |     17.67 |       74.8915 |                            11.3848 | 0.23s  |

| Softsatisficing (a=0.90)                                    |     17.73 |       76.2770 |                            12.8748 | 0.53s  |

| SoftElim (θ=5.00)                                           |     16.52 |       76.3033 |                            11.6823 | 0.50s  |

| RS (a=0.90)                                                 |     17.16 |       77.8704 |                            12.5705 | 0.28s  |

| Softsatisficing (a=0.25)                                    |     14.53 |       78.0274 |                            53.4058 | 0.08s  |

| ETC (m=2)                                                   |     15.27 |       80.4676 |                            18.0151 | 0.15s  |

| Softsatisficing (a=0.99)                                    |     15.71 |       81.5688 |                            13.1278 | 0.56s  |

| RS (a=0.99)                                                 |     15.39 |       82.4437 |                            12.9414 | 0.27s  |

| Least Failures                                              |     15.39 |       82.4443 |                            12.9451 | 0.14s  |

| RS (a=0.10)                                                 |     13.03 |       84.6250 |                            56.6531 | 0.24s  |

| FTPL-GR (lr=0.010)                                          |     14.69 |       85.0457 |                            12.8256 | 4.93s  |

| RS (a=0.00)                                                 |     12.85 |       85.6737 |                            57.3841 | 0.19s  |

| Softsatisficing (a=0.10)                                    |     12.81 |       86.5606 |                            58.7008 | 0.06s  |

| FTPL-GR (lr=0.001)                                          |     10.46 |      100.0294 |                            14.8858 | 5.25s  |

| Random                                                      |     10.01 |      102.0080 |                            15.1748 | 0.02s  |

| CODE (δ=0.050)                                              |     10.00 |      102.0185 |                            14.8649 | 0.47s  |

### Hard

This experiment uses 10 arms. All arms have a success probability of 0.5, except

for the best arm, which has a success probability of 0.51.

This experiment was taken from the paper describing Boltzmann-Gumbel Exploration.

Results

| Algorithm                                                   | %-Optimal | Regret (Mean) | Regret (Median Absolute Deviation) |  Time  |

| ----------------------------------------------------------- | --------: | ------------: | ---------------------------------: | :----: |

| POKER (H=100)                                               |     18.10 |        4.0949 |                             0.1700 | 0.39s  |

| Greedy                                                      |     17.00 |        4.1498 |                             0.1100 | 0.11s  |

| POKER (H=10)                                                |     17.00 |        4.1498 |                             0.1100 | 0.38s  |

| POKER (H=1)                                                 |     17.00 |        4.1498 |                             0.1100 | 0.35s  |

| POKER (H=25)                                                |     17.00 |        4.1498 |                             0.1100 | 0.38s  |

| POKER (H=5)                                                 |     17.00 |        4.1498 |                             0.1100 | 0.37s  |

| POKER (H=50)                                                |     17.00 |        4.1499 |                             0.1100 | 0.39s  |

| ϵ-Decreasing (ϵ=0.990)                                      |     16.90 |        4.1552 |                             0.1000 | 0.19s  |

| ϵ-Decreasing (ϵ=0.900)                                      |     16.80 |        4.1598 |                             0.1000 | 0.17s  |

| ϵ-Greedy (ϵ=0.010)                                          |     16.64 |        4.1682 |                             0.1000 | 0.12s  |

| ReUCB (a=1.00)                                              |     16.59 |        4.1704 |                             0.1200 | 1.09s  |

| ϵ-Decreasing (ϵ=0.700)                                      |     16.29 |        4.1854 |                             0.1000 | 0.20s  |

| ϵ-Greedy (ϵ=0.020)                                          |     16.25 |        4.1873 |                             0.1000 | 0.17s  |

| ReUCB (a=1.50)                                              |     15.54 |        4.2231 |                             0.1200 | 1.11s  |

| ReUCB (a=2.00)                                              |     15.41 |        4.2293 |                             0.1200 | 1.09s  |

| ϵ-Exploring TS-UCB (1 samples)                              |     15.26 |        4.2371 |                             0.1700 | 0.20s  |

| Gittins Index -- Whittle's Approximation (β=0.50)           |     15.19 |        4.2406 |                             0.1200 | 0.25s  |

| ϵ-Greedy (ϵ=0.050)                                          |     15.11 |        4.2447 |                             0.0900 | 0.16s  |

| ϵ-Decreasing (ϵ=0.500)                                      |     14.77 |        4.2614 |                             0.0800 | 0.18s  |

| ϵ-Exploring TS-UCB (10 samples)                             |     14.23 |        4.2887 |                             0.1700 | 1.01s  |

| Batch Ensemble for MAB (m=8)                                |     14.22 |        4.2892 |                             0.3800 | 0.41s  |

| ϵ-Exploring TS-UCB (100 samples)                            |     14.10 |        4.2951 |                             0.1700 | 8.79s  |

| ϵ-Decreasing (ϵ=0.200)                                      |     14.05 |        4.2973 |                             0.1600 | 0.16s  |

| ϵ-Greedy (ϵ=0.100)                                          |     13.97 |        4.3014 |                             0.0800 | 0.16s  |

| ϵ-Exploring Thompson Sampling                               |     13.74 |        4.3130 |                             0.1100 | 0.21s  |

| Batch Ensemble for MAB (m=2)                                |     13.68 |        4.3160 |                             0.2900 | 0.21s  |

| Forced Exploration                                          |     13.53 |        4.3235 |                             0.1000 | 0.16s  |

| RS (a=0.50)                                                 |     13.42 |        4.3290 |                             0.0100 | 0.25s  |

| IRS.FH (H=1)                                                |     13.36 |        4.3318 |                             0.1200 | 1.28s  |

| UCB-DT (γ=0.90)                                             |     13.27 |        4.3365 |                             0.1000 | 3.45s  |

| UCB-DT (γ=0.95)                                             |     13.27 |        4.3365 |                             0.1000 | 3.61s  |

| Gittins Index -- Whittle's Approximation (β=0.70)           |     13.25 |        4.3374 |                             0.1200 | 0.30s  |

| UCB-DT (γ=1.00)                                             |     13.19 |        4.3406 |                             0.1200 | 3.27s  |

| SoftElim (θ=0.01)                                           |     13.13 |        4.3434 |                             0.1100 | 0.41s  |

| UCB-DT (γ=0.75)                                             |     13.05 |        4.3474 |                             0.1000 | 3.41s  |

| MOSS-Anytime (α=-0.33)                                      |     13.00 |        4.3502 |                             0.2000 | 0.28s  |

| MOSS-Anytime (α=-0.85)                                      |     12.95 |        4.3526 |                             0.1800 | 0.28s  |

| MOSS-Anytime (α=-0.50)                                      |     12.94 |        4.3532 |                             0.1700 | 0.29s  |

| IRS.FH (H=2)                                                |     12.84 |        4.3582 |                             0.1700 | 1.48s  |

| Batch Ensemble for MAB (m=1)                                |     12.71 |        4.3646 |                             0.4800 | 0.12s  |

| Gittins Index -- Whittle's Approximation (β=0.99)           |     12.66 |        4.3672 |                             0.2000 | 0.30s  |

| IRS.FH (H=3)                                                |     12.63 |        4.3687 |                             0.1900 | 1.63s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.5)                          |     12.62 |        4.3690 |                             0.1800 | 0.23s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.1)                          |     12.62 |        4.3692 |                             0.2100 | 0.22s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.001)                       |     12.58 |        4.3710 |                             0.1800 | 0.19s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.1)                         |     12.58 |        4.3710 |                             0.1800 | 0.22s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.5)                         |     12.58 |        4.3710 |                             0.1800 | 0.23s  |

| Batch Ensemble for MAB (m=0)                                |     12.57 |        4.3714 |                             0.1700 | 0.09s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.8)     |     12.47 |        4.3767 |                             0.1200 | 0.43s  |

| BayesUCB (δ=0.500)                                          |     12.44 |        4.3782 |                             0.2000 | 0.26s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.001)                        |     12.41 |        4.3797 |                             0.2600 | 0.23s  |

| POKER (H=250)                                               |     12.40 |        4.3802 |                             0.2500 | 0.40s  |

| BayesUCB (δ=0.400)                                          |     12.37 |        4.3813 |                             0.2000 | 0.27s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.9)     |     12.37 |        4.3816 |                             0.1200 | 0.47s  |

| Gittins Index -- Whittle's Approximation (β=0.90)           |     12.37 |        4.3816 |                             0.1700 | 0.30s  |

| IRS.FH (H=4)                                                |     12.29 |        4.3857 |                             0.1900 | 1.52s  |

| TS-UCB (100 samples)                                        |     12.17 |        4.3915 |                             0.2500 | 84.25s |

| UCBT                                                        |     12.17 |        4.3916 |                             0.4200 | 0.13s  |

| BayesUCB (δ=0.300)                                          |     12.17 |        4.3917 |                             0.2500 | 0.28s  |

| SoftElim (θ=0.10)                                           |     12.05 |        4.3976 |                             0.2000 | 0.53s  |

| IRS.FH (H=5)                                                |     12.02 |        4.3990 |                             0.2200 | 1.51s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.001)                           |     12.01 |        4.3996 |                             0.3500 | 0.23s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.5)                             |     12.00 |        4.3999 |                             0.2500 | 0.22s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.1)                             |     11.98 |        4.4008 |                             0.2500 | 0.23s  |

| ϵ-Decreasing (ϵ=0.100)                                      |     11.91 |        4.4043 |                             0.1500 | 0.10s  |

| Bootstrapped Thompson Sampling (J=10)                       |     11.83 |        4.4083 |                             0.1600 | 0.48s  |

| Bootstrapped Thompson Sampling (J=500)                      |     11.80 |        4.4101 |                             0.3400 | 5.60s  |

| Bootstrapped Thompson Sampling (J=1000)                     |     11.78 |        4.4109 |                             0.3400 | 11.46s |

| Bootstrapped Thompson Sampling (J=100)                      |     11.76 |        4.4118 |                             0.3100 | 1.41s  |

| EB-TCI                                                      |     11.56 |        4.4218 |                             0.4400 | 0.38s  |

| Batch Ensemble for MAB (m=4)                                |     11.56 |        4.4221 |                             0.3100 | 0.30s  |

| WR-SDA (forced_exploration=false)                           |     11.52 |        4.4238 |                             0.3200 | 0.74s  |

| BayesUCB (δ=0.200)                                          |     11.51 |        4.4245 |                             0.2300 | 0.29s  |

| IRS.FH (H=10)                                               |     11.46 |        4.4270 |                             0.2400 | 1.62s  |

| TS-UCB (10 samples)                                         |     11.45 |        4.4275 |                             0.2500 | 8.47s  |

| Thompson Sampling with Virtual Helping Agents (Combiner C3) |     11.45 |        4.4276 |                             0.2600 | 6.91s  |

| Vanilla Residual Bootstrap (init=0)                         |     11.42 |        4.4292 |                             0.3500 | 0.22s  |

| MARS (δ=0.100)                                              |     11.41 |        4.4296 |                             0.2400 | 0.47s  |

| WR-SDA (forced_exploration=true)                            |     11.41 |        4.4296 |                             0.3000 | 0.76s  |

| CODE (δ=0.900)                                              |     11.39 |        4.4305 |                             0.4900 | 0.48s  |

| ReBoot (r=0.25)                                             |     11.38 |        4.4311 |                             0.3500 | 0.27s  |

| BayesUCB (δ=0.900)                                          |     11.37 |        4.4314 |                             0.1200 | 0.26s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.95)    |     11.35 |        4.4324 |                             0.1200 | 0.48s  |

| ReBoot (r=0.50)                                             |     11.34 |        4.4329 |                             0.3800 | 0.33s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.99)    |     11.29 |        4.4355 |                             0.0900 | 0.53s  |

| TS-UCB (1 samples)                                          |     11.21 |        4.4395 |                             0.2400 | 1.10s  |

| CODE (δ=0.990)                                              |     11.21 |        4.4397 |                             0.1200 | 0.43s  |

| Garbage In, Reward Out (a=0.10)                             |     11.16 |        4.4418 |                             0.3400 | 1.37s  |

| BayesUCB (δ=0.100)                                          |     11.16 |        4.4421 |                             0.2400 | 0.23s  |

| Non-Parametric Thompson Sampling                            |     11.16 |        4.4422 |                             0.3400 | 4.52s  |

| Satisficing Thompson Sampling (ϵ=0.010)                     |     11.15 |        4.4425 |                             0.3400 | 1.12s  |

| Satisficing Thompson Sampling (ϵ=0.005)                     |     11.15 |        4.4426 |                             0.3300 | 1.15s  |

| Thompson Sampling                                           |     11.14 |        4.4429 |                             0.3300 | 0.79s  |

| IRS.FH (H=25)                                               |     11.14 |        4.4431 |                             0.2600 | 2.29s  |

| SoftElim (θ=0.25)                                           |     11.14 |        4.4431 |                             0.2500 | 0.53s  |

| Perturbed-History Exploration (a=1.1)                       |     11.13 |        4.4433 |                             0.3600 | 1.07s  |

| Weighted Bootstrap                                          |     11.13 |        4.4436 |                             0.3400 | 3.51s  |

| Multiplier Bootstrap-based Exploration                      |     11.12 |        4.4439 |                             0.3100 | 7.53s  |

| Softsatisficing (a=0.65)                                    |     11.12 |        4.4441 |                             0.3000 | 0.51s  |

| Vanilla Residual Bootstrap (init=1)                         |     11.11 |        4.4443 |                             0.3500 | 0.28s  |

| Satisficing Thompson Sampling (ϵ=0.050)                     |     11.09 |        4.4454 |                             0.4000 | 1.00s  |

| Garbage In, Reward Out (a=0.33)                             |     11.04 |        4.4480 |                             0.3800 | 1.60s  |

| Tsallis-INF                                                 |     11.01 |        4.4497 |                             0.2700 | 1.32s  |

| KL-UCB                                                      |     10.99 |        4.4505 |                             0.2800 | 10.07s |

| MARS (δ=1.000)                                              |     10.95 |        4.4523 |                             0.0000 | 0.12s  |

| RS (a=0.65)                                                 |     10.95 |        4.4523 |                             0.2500 | 0.29s  |

| ReBoot (r=0.90)                                             |     10.94 |        4.4528 |                             0.3800 | 0.32s  |

| SoftElim (θ=0.50)                                           |     10.93 |        4.4536 |                             0.3100 | 0.51s  |

| MARS (δ=0.010)                                              |     10.93 |        4.4537 |                             0.3700 | 2.68s  |

| Kullback-Leibler Maillard Sampling                          |     10.91 |        4.4544 |                             0.3500 | 0.70s  |

| Perturbed-History Exploration (a=2.1)                       |     10.89 |        4.4557 |                             0.3400 | 1.43s  |

| SoftElim (θ=1.00)                                           |     10.88 |        4.4560 |                             0.3300 | 0.50s  |

| Vanilla Residual Bootstrap (init=5)                         |     10.85 |        4.4574 |                             0.2700 | 0.26s  |

| Hellinger-UCB                                               |     10.85 |        4.4575 |                             0.2800 | 2.99s  |

| lil' UCB (δ=0.100)                                          |     10.85 |        4.4575 |                             0.2600 | 0.33s  |

| ReBoot (r=1.00)                                             |     10.84 |        4.4578 |                             0.3500 | 0.30s  |

| Bounded Dirichlet Sampling                                  |     10.83 |        4.4586 |                             0.3100 | 2.50s  |

| MARS (δ=0.002)                                              |     10.79 |        4.4603 |                             0.2800 | 10.47s |

| UCB1-Tuned                                                  |     10.74 |        4.4632 |                             0.2400 | 0.37s  |

| Satisficing Thompson Sampling (ϵ=0.100)                     |     10.72 |        4.4641 |                             0.3100 | 1.01s  |

| MARS (δ=0.001)                                              |     10.71 |        4.4646 |                             0.3200 | 19.80s |

| lil' UCB (δ=0.010)                                          |     10.70 |        4.4651 |                             0.2200 | 0.34s  |

| Boltzmann-Gumbel Exploration                                |     10.67 |        4.4663 |                             0.2700 | 0.46s  |

| Garbage In, Reward Out (a=1.00)                             |     10.66 |        4.4669 |                             0.2600 | 1.48s  |

| Gradient Bandit                                             |     10.66 |        4.4672 |                             0.2000 | 0.55s  |

| Gradient Bandit (with baseline)                             |     10.60 |        4.4699 |                             0.2000 | 0.57s  |

| FTPL-GR (lr=0.100)                                          |     10.58 |        4.4711 |                             0.2500 | 5.51s  |

| lil' UCB (δ=0.001)                                          |     10.54 |        4.4730 |                             0.2000 | 0.32s  |

| ReBoot (r=1.50)                                             |     10.49 |        4.4756 |                             0.2100 | 0.30s  |

| FTPL-GR (lr=1.000)                                          |     10.46 |        4.4769 |                             0.2900 | 5.96s  |

| SoftElim (θ=2.00)                                           |     10.41 |        4.4797 |                             0.1700 | 0.48s  |

| Perturbed-History Exploration (a=5.1)                       |     10.40 |        4.4798 |                             0.1900 | 1.48s  |

| ReBoot (r=1.70)                                             |     10.40 |        4.4798 |                             0.1800 | 0.31s  |

| Softsatisficing (a=0.75)                                    |     10.38 |        4.4810 |                             0.1900 | 0.60s  |

| RS (a=0.75)                                                 |     10.36 |        4.4819 |                             0.1600 | 0.26s  |

| EXP-IX                                                      |     10.36 |        4.4822 |                             0.1600 | 0.54s  |

| ReBoot (r=2.10)                                             |     10.29 |        4.4854 |                             0.1400 | 0.30s  |

| RS (a=0.00)                                                 |     10.28 |        4.4861 |                             0.0000 | 0.19s  |

| ETC (m=25)                                                  |     10.27 |        4.4863 |                             0.0000 | 0.20s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.1)                             |     10.25 |        4.4873 |                             0.1100 | 0.22s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.5)                             |     10.25 |        4.4873 |                             0.1100 | 0.23s  |

| RS (a=0.10)                                                 |     10.24 |        4.4881 |                             0.0000 | 0.24s  |

| UCB1                                                        |     10.23 |        4.4883 |                             0.1600 | 0.17s  |

| Softsatisficing (a=0.50)                                    |     10.23 |        4.4884 |                             0.0000 | 0.12s  |

| RS (a=0.90)                                                 |     10.22 |        4.4891 |                             0.1100 | 0.28s  |

| RAVEN-UCB (a0=5, b0=1, eps=0.001)                           |     10.21 |        4.4893 |                             0.1200 | 0.23s  |

| Softsatisficing (a=0.90)                                    |     10.21 |        4.4897 |                             0.1200 | 0.55s  |

| Softsatisficing (a=0.99)                                    |     10.17 |        4.4914 |                             0.1100 | 0.55s  |

| SoftElim (θ=5.00)                                           |     10.17 |        4.4916 |                             0.1000 | 0.51s  |

| FTPL-GR (lr=0.010)                                          |     10.14 |        4.4930 |                             0.0900 | 5.39s  |

| ETC (m=5)                                                   |     10.11 |        4.4943 |                             0.0000 | 0.18s  |

| ETC (m=20)                                                  |     10.11 |        4.4946 |                             0.0000 | 0.18s  |

| Least Failures                                              |     10.08 |        4.4961 |                             0.0900 | 0.09s  |

| RS (a=0.99)                                                 |     10.08 |        4.4961 |                             0.0900 | 0.27s  |

| Softsatisficing (a=0.25)                                    |     10.05 |        4.4976 |                             0.0000 | 0.11s  |

| ETC (m=2)                                                   |     10.04 |        4.4982 |                             0.4300 | 0.13s  |

| ETC (m=3)                                                   |     10.04 |        4.4982 |                             0.4300 | 0.20s  |

| FTPL-GR (lr=0.001)                                          |     10.02 |        4.4988 |                             0.0400 | 4.93s  |

| Random                                                      |     10.02 |        4.4992 |                             0.0500 | 0.02s  |

| CODE (δ=0.050)                                              |     10.00 |        4.5000 |                             0.0000 | 0.47s  |

| RS (a=0.25)                                                 |     10.00 |        4.5000 |                             0.0000 | 0.25s  |

| Softsatisficing (a=0.10)                                    |     10.00 |        4.5000 |                             0.0000 | 0.07s  |

| ETC (m=10)                                                  |      9.94 |        4.5030 |                             0.0000 | 0.17s  |

### Beta

This experiment uses 10 arms. The arm means are sampled from a Beta(1, 8) distribution.

This experiment was taken from the paper _Multiplier Bootstrap-based Exploration_.

Results

| Algorithm                                                   | %-Optimal | Regret (Mean) | Regret (Median Absolute Deviation) |  Time  |

| ----------------------------------------------------------- | --------: | ------------: | ---------------------------------: | :----: |

| BayesUCB (δ=0.900)                                          |     57.80 |       20.8173 |                             5.4766 | 0.29s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.95)    |     57.01 |       21.0752 |                             5.5688 | 0.50s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.9)     |     56.78 |       21.0892 |                             5.6266 | 0.50s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.1)                         |     55.98 |       21.1003 |                             6.6151 | 0.23s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.5)                         |     55.97 |       21.1006 |                             6.6257 | 0.22s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.8)     |     56.64 |       21.1021 |                             5.7940 | 0.47s  |

| RAVEN-UCB (a0=0.5, b0=0.5, eps=0.001)                       |     55.97 |       21.1268 |                             6.5255 | 0.20s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.1)                          |     55.99 |       21.1428 |                             6.5261 | 0.23s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.5)                          |     55.95 |       21.1445 |                             6.5317 | 0.23s  |

| Gittins Index -- Whittle's Approximation (β=0.70)           |     57.25 |       21.1454 |                             5.2421 | 0.31s  |

| Gittins Index -- Whittle's Approximation (β=0.50)           |     56.58 |       21.2012 |                             5.5556 | 0.27s  |

| Gittins Index -- Brezzi and Lai's Approximation (β=0.99)    |     57.41 |       21.2745 |                             4.9635 | 0.50s  |

| RAVEN-UCB (a0=0.5, b0=10, eps=0.001)                        |     55.91 |       21.2843 |                             6.6230 | 0.23s  |

| Gittins Index -- Whittle's Approximation (β=0.90)           |     57.73 |       21.7090 |                             4.9167 | 0.29s  |

| IRS.FH (H=1)                                                |     55.74 |       21.8777 |                             5.8971 | 1.47s  |

| MOSS-Anytime (α=-0.85)                                      |     54.95 |       22.2898 |                             5.6517 | 0.24s  |

| Batch Ensemble for MAB (m=0)                                |     57.67 |       22.3441 |                             5.1453 | 0.10s  |

| UCB-DT (γ=0.75)                                             |     54.64 |       22.4071 |                             6.1492 | 3.48s  |

| UCB-DT (γ=0.90)                                             |     54.45 |       22.4627 |                             6.1571 | 3.82s  |

| UCB-DT (γ=0.95)                                             |     54.39 |       22.4968 |                             6.1852 | 3.54s  |

| UCB-DT (γ=1.00)                                             |     53.32 |       22.6778 |                             7.3649 | 3.80s  |

| SoftElim (θ=0.10)                                           |     56.27 |       22.7533 |                             6.0639 | 0.46s  |

| Thompson Sampling with Virtual Helping Agents (Combiner C3) |     56.94 |       22.9408 |                             7.1147 | 21.79s |

| IRS.FH (H=2)                                                |     56.33 |       23.0910 |                             5.0660 | 1.59s  |

| CODE (δ=0.990)                                              |     51.11 |       23.5974 |                             9.3932 | 0.49s  |

| Gittins Index -- Whittle's Approximation (β=0.99)           |     56.84 |       23.7789 |                             5.0319 | 0.31s  |

| MOSS-Anytime (α=-0.50)                                      |     56.24 |       24.1465 |                             4.0881 | 0.26s  |

| BayesUCB (δ=0.500)                                          |     56.42 |       24.2684 |                             5.5883 | 0.28s  |

| IRS.FH (H=3)                                                |     55.61 |       24.5294 |                             4.7675 | 1.59s  |

| TS-UCB (100 samples)                                        |     56.12 |       24.7396 |                             4.2715 | 97.10s |

| ReBoot (r=0.25)                                             |     52.26 |       24.7586 |                             8.6759 | 0.29s  |

| IRS.FH (H=4)                                                |     55.19 |       25.4645 |                             4.9019 | 1.66s  |

| BayesUCB (δ=0.400)                                          |     55.31 |       25.7435 |                             5.7135 | 0.29s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.5)                             |     54.87 |       26.1006 |                             5.6929 | 0.22s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.1)                             |     54.85 |       26.1238 |                             5.6912 | 0.23s  |

| MOSS-Anytime (α=-0.33)                                      |     54.80 |       26.1464 |                             4.2098 | 0.26s  |

| IRS.FH (H=5)                                                |     54.72 |       26.2315 |                             4.9551 | 1.69s  |

| TS-UCB (10 samples)                                         |     55.00 |       26.2465 |                             4.2202 | 9.53s  |

| RAVEN-UCB (a0=1, b0=5, eps=0.001)                           |     54.65 |       26.3698 |                             5.5242 | 0.23s  |

| SoftElim (θ=0.01)                                           |     47.37 |       26.4249 |                             8.8470 | 0.45s  |

| MARS (δ=0.100)                                              |     50.54 |       26.5911 |                             7.3123 | 0.41s  |

| Batch Ensemble for MAB (m=1)                                |     53.77 |       26.8886 |                             7.9552 | 0.13s  |

| BayesUCB (δ=0.300)                                          |     53.93 |       27.5129 |                             5.7981 | 0.27s  |

| RS (a=0.25)                                                 |     51.31 |       27.6540 |                            15.4650 | 0.23s  |

| IRS.FH (H=10)                                               |     53.19 |       28.6872 |                             5.2676 | 1.78s  |

| SoftElim (θ=0.25)                                           |     51.91 |       28.7964 |                             6.3713 | 0.44s  |

| UCBT                                                        |     47.49 |       28.8558 |                             8.0049 | 0.13s  |

| ReBoot (r=0.50)                                             |     51.44 |       28.9633 |                             6.3791 | 0.39s  |

| MARS (δ=0.010)                                              |     51.25 |       29.2541 |                             7.4360 | 2.37s  |

| TS-UCB (1 samples)                                          |     52.69 |       29.2908 |                             4.9082 | 1.23s  |

| ϵ-Exploring TS-UCB (1 samples)                              |     47.60 |       29.4661 |                             9.1640 | 0.24s  |

| Softsatisficing (a=0.25)                                    |     49.07 |       29.5243 |                            16.9209 | 0.64s  |

| ϵ-Exploring TS-UCB (10 samples)                             |     47.35 |       29.8509 |                             9.2614 | 1.09s  |

| BayesUCB (δ=0.200)                                          |     51.93 |       29.8533 |                             5.8074 | 0.27s  |

| ϵ-Exploring TS-UCB (100 samples)                            |     47.30 |       29.8928 |                             9.1374 | 9.16s  |

| RS (a=0.10)                                                 |     44.30 |       29.9723 |                            13.9283 | 0.22s  |

| Hellinger-UCB                                               |     50.41 |       30.1850 |                             5.4750 | 3.12s  |

| ϵ-Decreasing (ϵ=0.500)                                      |     45.66 |       30.9426 |                            10.3885 | 0.19s  |

| Bootstrapped Thompson Sampling (J=10)                       |     49.88 |       31.1623 |                             6.5576 | 0.61s  |

| Forced Exploration                                          |     48.86 |       31.4112 |                             9.0715 | 0.16s  |

| WR-SDA (forced_exploration=true)                            |     45.51 |       31.4285 |                             8.7057 | 1.25s  |

| IRS.FH (H=25)                                               |     50.57 |       32.0301 |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datawraith/banditbench

Awesome Lists containing this project

README