https://github.com/djo/delayed-bandit

Multi-armed bandit problem under delayed feedback: framework for the numerical experiments
https://github.com/djo/delayed-bandit

Last synced: about 1 year ago
JSON representation

Multi-armed bandit problem under delayed feedback: framework for the numerical experiments

Host: GitHub
URL: https://github.com/djo/delayed-bandit
Owner: djo
License: mit
Created: 2021-02-14T19:41:57.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-04-13T16:52:28.000Z (about 3 years ago)
Last Synced: 2025-03-30T05:11:33.750Z (over 1 year ago)
Language: Jupyter Notebook
Size: 187 KB
Stars: 8
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Multi-armed bandit (MAB) problem under delayed feedback: numerical experiments

![Build](https://github.com/djo/delayed-bandit/workflows/Python%20application/badge.svg)

The framework for numerical experiments to simulate the multi-armed bandit in the stochastic stationary environment with delays.

## Beta Upper Confidence Bound Policy for the Design of Clinical Trials, 2023

Evaluation of the adapted to delays policies using the publicly available dataset The International Stroke Trial. See [this notebook](Beta-Upper-Confidence-Bound-Policy-for-the-Design-of-Clinical-Trials.ipynb) for the analysis and simulation. 

## Bernoulli multi-armed bandit problem under delayed feedback, 2021

Provides the framework for numerical experiments to simulate the multi-armed bandit problem

in the stochastic stationary environment with delays. Part of the paper [Bernoulli multi-armed bandit problem under delayed feedback](https://djo.github.io/assets/bernoulli-multi-armed-bandit-problem-under-delayed-feedback.pdf)

([Journal](https://bphm.knu.ua/index.php/bphm/article/view/214)).

Structure of the project and currently implemented algorithms:

||Files|

|-|-|

|Environments|[Protocol](delayed_bandit/environments/environment.py)|

||[Bernoulli MAB](delayed_bandit/environments/bernoulli_bandit.py)|

|Policies|[Protocol](delayed_bandit/policies/policy.py)|

||[Uniform Random](delayed_bandit/policies/uniform_random.py)|

||[Explore-First](delayed_bandit/policies/etc.py)|

||[Epsilon-Greedy](delayed_bandit/policies/epsilon_greedy.py)|

||[Upper Confidence Bound](delayed_bandit/policies/ucb.py)|

||[Thompson Sampling (Beta distribution)](delayed_bandit/policies/beta_thompson_sampling.py)|

|Experiments|[Bernoulli MAB under delayed feedback](delayed_bandit/experiments.py)|

|Tests|[Test module](delayed_bandit/test/)|

To run experiments on Bernoulli MAB see

```

python delayed_bandit/experiments.py --help

```

One might want to run a significant number of experiments and aggregate the result by removing outliers and averaging.

The sampling of delays might be fixated over the horizon.

![Bernoulli MAB under delayed feedback with Explore-First algorithm](bernoulli-mab-explore-then-commit.png)

![Comparison of algorithms in Bernoulli MAB with no delays](all-algorithms-no-delay.png)

![Comparison of algorithms in Bernoulli MAB under delay t=50](all-algorithms-delay-50.png)

![Comparison of algorithms in Bernoulli MAB under delay t=150](all-algorithms-delay-150.png)

### Development

```

python3 -m venv env

source env/bin/activate

pip install -r requirements.txt

./pychecks.sh

```

MIT License

Copyright (c) 2023 Andrii Dzhoha

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/djo/delayed-bandit

Awesome Lists containing this project

README