https://github.com/djo/delayed-bandit
Multi-armed bandit problem under delayed feedback: framework for the numerical experiments
https://github.com/djo/delayed-bandit
Last synced: about 1 year ago
JSON representation
Multi-armed bandit problem under delayed feedback: framework for the numerical experiments
- Host: GitHub
- URL: https://github.com/djo/delayed-bandit
- Owner: djo
- License: mit
- Created: 2021-02-14T19:41:57.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-04-13T16:52:28.000Z (about 3 years ago)
- Last Synced: 2025-03-30T05:11:33.750Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 187 KB
- Stars: 8
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Multi-armed bandit (MAB) problem under delayed feedback: numerical experiments

The framework for numerical experiments to simulate the multi-armed bandit in the stochastic stationary environment with delays.
## Beta Upper Confidence Bound Policy for the Design of Clinical Trials, 2023
Evaluation of the adapted to delays policies using the publicly available dataset The International Stroke Trial. See [this notebook](Beta-Upper-Confidence-Bound-Policy-for-the-Design-of-Clinical-Trials.ipynb) for the analysis and simulation.
## Bernoulli multi-armed bandit problem under delayed feedback, 2021
Provides the framework for numerical experiments to simulate the multi-armed bandit problem
in the stochastic stationary environment with delays. Part of the paper [Bernoulli multi-armed bandit problem under delayed feedback](https://djo.github.io/assets/bernoulli-multi-armed-bandit-problem-under-delayed-feedback.pdf)
([Journal](https://bphm.knu.ua/index.php/bphm/article/view/214)).
Structure of the project and currently implemented algorithms:
||Files|
|-|-|
|Environments|[Protocol](delayed_bandit/environments/environment.py)|
||[Bernoulli MAB](delayed_bandit/environments/bernoulli_bandit.py)|
|Policies|[Protocol](delayed_bandit/policies/policy.py)|
||[Uniform Random](delayed_bandit/policies/uniform_random.py)|
||[Explore-First](delayed_bandit/policies/etc.py)|
||[Epsilon-Greedy](delayed_bandit/policies/epsilon_greedy.py)|
||[Upper Confidence Bound](delayed_bandit/policies/ucb.py)|
||[Thompson Sampling (Beta distribution)](delayed_bandit/policies/beta_thompson_sampling.py)|
|Experiments|[Bernoulli MAB under delayed feedback](delayed_bandit/experiments.py)|
|Tests|[Test module](delayed_bandit/test/)|
To run experiments on Bernoulli MAB see
```
python delayed_bandit/experiments.py --help
```
One might want to run a significant number of experiments and aggregate the result by removing outliers and averaging.
The sampling of delays might be fixated over the horizon.




### Development
```
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
./pychecks.sh
```
MIT License
Copyright (c) 2023 Andrii Dzhoha