https://github.com/kapshaul/onlinelearning
Repository of Online Learning algorithms, including Bandits, UCB, and more.
https://github.com/kapshaul/onlinelearning
adaptive-ad adversarial-learning bandit bandit-learning linear-regression machine-learning online-learning upper-confidence-bound
Last synced: 7 months ago
JSON representation
Repository of Online Learning algorithms, including Bandits, UCB, and more.
- Host: GitHub
- URL: https://github.com/kapshaul/onlinelearning
- Owner: kapshaul
- Created: 2024-08-08T13:42:30.000Z (about 1 year ago)
- Default Branch: bandits-comparison-analysis
- Last Pushed: 2024-09-28T13:45:25.000Z (about 1 year ago)
- Last Synced: 2025-01-19T17:59:32.307Z (9 months ago)
- Topics: adaptive-ad, adversarial-learning, bandit, bandit-learning, linear-regression, machine-learning, online-learning, upper-confidence-bound
- Language: Python
- Homepage:
- Size: 5.13 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Comparison of Bandit Algorithms
## Overview
This repository includes implementations and performance reports of several bandit algorithms. The study covers:[1. Explore-then-Commit](#1-explore-then-commit)
[2. Upper Confidence Bound (UCB)](#2-upper-confidence-bound-ucb)
[3. Thompson Sampling](#3-thompson-sampling)
[4. Linear UCB (LinUCB)](#4-linear-ucb-linucb)
[5. Linear Thompson Sampling (LinTS)](#5-linear-thompson-sampling-lints)
[6. Generalized Linear Model Bandit (GLM)](#6-generalized-linear-model-glm-bandit-non-linear-bandit)
## Implementation
To simulate a specific algorithm, edit the `Simulation.py` script by enabling the desired algorithm and disabling the others.For example, to run the UCB algorithm with $\alpha = 0.5$, update the code as follows:
```python
## Initiate Bandit Algorithms ##
algorithms = {}#algorithms['EpsilonGreedyLinearBandit'] = EpsilonGreedyLinearBandit(dimension=context_dimension, lambda_=0.1, epsilon=None)
#algorithms['EpsilonGreedyMultiArmedBandit'] = EpsilonGreedyMultiArmedBandit(num_arm=n_articles, epsilon=0.1)
#algorithms['ExplorethenCommit'] = ExplorethenCommit(num_arm=n_articles, m=30)
algorithms['UCBBandit'] = UCBBandit(num_arm=n_articles, alpha=0.5)
#algorithms['ThompsonSamplingGaussianMAB'] = ThompsonSamplingGaussianMAB(num_arm=n_articles)
#algorithms['LinearUCBBandit'] = LinearUCBBandit(dimension=context_dimension, lambda_=0.1, alpha=0.5) #delta=0.05, alpha=2.358
#algorithms['LinearThompsonSamplingMAB'] = LinearThompsonSamplingMAB(dimension=context_dimension, lambda_=0.1)
```After selecting your algorithm, run the `Simulation.py` script.
## 1. Explore-then-Commit
### Result
| Hyperparameter (m) | Cumulative Regret |
|:------------------:|:-----------------:|
| 10 | 1001.40 |
| 20 | 214.90 |
| 30 | 334.02 |
![]()
![]()
(a) m = 10
(b) m = 20
(c) m = 30**Figure 1**: Explore then Commit accumulated regret
## 2. Upper Confidence Bound (UCB)
### Reward Estimation + Confidence Bound
$$
\text{UCB} = \hat u_{t-1,i} + \sqrt{\frac{2 \ln t}{S_{t-1,i}}}
$$### Result
| Hyperparameter (α) | Cumulative Regret |
|:------------------:|:-----------------:|
| 0.1 | 256.50 |
| 0.5 | 977.03 |
| 1.0 | 1906.65 |
![]()
![]()
(a) $\alpha$ = 0.1
(b) $\alpha$ = 0.5
(c) $\alpha$ = 1**Figure 2**: UCB Bandit accumulated regret
## 3. Thompson Sampling
### Posterior Distribution$$
N \sim \left( \hat u_{t-1,i}, \frac{1}{S_{t-1,i} + 1} \right)
$$### Result
| Cumulative Regret |
|:------------------:|
| 100 |
**Figure 3**: Thompson Sampling accumulated regret
## 4. Linear UCB (LinUCB)
### Parameter Estimation$$
\hat \theta_{t+1} = A^{-1}_ {t+1} b_{t+1}
$$### Reward Estimation + Confidence Bound
$$
\text{UCB} = x^T \hat \theta_t + \alpha \sqrt{x^T A^{-1} x}
$$### Result
| Hyperparameter (α) | Cumulative Regret |
|:------------------:|:-----------------:|
| 0.5 | 24.43 |
| 1.5 | 177.89 |
| 2.5 | 487.73 |
![]()
![]()
(a) $\alpha$ = 0.5
(b) $\alpha$ = 1.5
(c) $\alpha$ = 2.5**Figure 4**: Linear UCB accumulated regret
![]()
![]()
(a) $\alpha$ = 0.5
(b) $\alpha$ = 1.5
(c) $\alpha$ = 2.5**Figure 5**: Linear UCB estimation error
## 5. Linear Thompson Sampling (LinTS)
### Posterior Distribution
$$
N \sim (\hat{\theta}_t, A^{-1}_t)
$$### Result
| Cumulative Regret |
|:------------------:|
| 1098.24 |
![]()
**Figure 6**: Linear Thompson Sampling accumulated regret and estimation error
## 6. Generalized Linear Model (GLM) Bandit: Non-linear Bandit
### Modified Non-LinearReward Function For Testing$$
R = (x^T \theta)^2 + \epsilon, \text{ where } \epsilon \sim N(\mu, \sigma^2)
$$### GLM Parameter Estimation (MLE)
$$
\hat \theta_{t+1} = \max_{\theta} P(r|\theta) = A^{-1}_ {t+1} b_{t+1}
$$### GLM UCB
$$
UCB_{GLM} = f(x^T \hat \theta_t) + \alpha \sqrt{x^T A^{-1} x} = (x^T \hat \theta_t)^2 + \alpha \sqrt{x^T A^{-1} x}
$$### Result
| Hyperparameter (α) | Cumulative Regret |
|:------------------:|:-----------------:|
| 0.1 | 62.16 |
| 0.5 | 727.63 |
| 1.5 | 5948.48 |
![]()
![]()
(a) $\alpha$ = 0.1
(b) $\alpha$ = 0.5
(c) $\alpha$ = 1.5**Figure 7**: GLM-UCB accumulated regret
![]()
![]()
(a) $\alpha$ = 0.1
(b) $\alpha$ = 0.5
(c) $\alpha$ = 1.5**Figure 8**: GLM-UCB estimation error