https://github.com/kapshaul/onlinelearning

Repository of Online Learning algorithms, including Bandits, UCB, and more.
https://github.com/kapshaul/onlinelearning

adaptive-ad adversarial-learning bandit bandit-learning linear-regression machine-learning online-learning upper-confidence-bound

Last synced: 7 months ago
JSON representation

Repository of Online Learning algorithms, including Bandits, UCB, and more.

Host: GitHub
URL: https://github.com/kapshaul/onlinelearning
Owner: kapshaul
Created: 2024-08-08T13:42:30.000Z (about 1 year ago)
Default Branch: bandits-comparison-analysis
Last Pushed: 2024-09-28T13:45:25.000Z (about 1 year ago)
Last Synced: 2025-01-19T17:59:32.307Z (9 months ago)
Topics: adaptive-ad, adversarial-learning, bandit, bandit-learning, linear-regression, machine-learning, online-learning, upper-confidence-bound
Language: Python
Homepage:
Size: 5.13 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Comparison of Bandit Algorithms

## Overview

This repository includes implementations and performance reports of several bandit algorithms. The study covers:

[1. Explore-then-Commit](#1-explore-then-commit)

[2. Upper Confidence Bound (UCB)](#2-upper-confidence-bound-ucb)

[3. Thompson Sampling](#3-thompson-sampling)

[4. Linear UCB (LinUCB)](#4-linear-ucb-linucb)

[5. Linear Thompson Sampling (LinTS)](#5-linear-thompson-sampling-lints)

[6. Generalized Linear Model Bandit (GLM)](#6-generalized-linear-model-glm-bandit-non-linear-bandit)

## Implementation

To simulate a specific algorithm, edit the `Simulation.py` script by enabling the desired algorithm and disabling the others.

For example, to run the UCB algorithm with $\alpha = 0.5$, update the code as follows:

```python

## Initiate Bandit Algorithms ##

algorithms = {}

#algorithms['EpsilonGreedyLinearBandit'] = EpsilonGreedyLinearBandit(dimension=context_dimension, lambda_=0.1, epsilon=None)

#algorithms['EpsilonGreedyMultiArmedBandit'] = EpsilonGreedyMultiArmedBandit(num_arm=n_articles, epsilon=0.1)

#algorithms['ExplorethenCommit'] = ExplorethenCommit(num_arm=n_articles, m=30)

algorithms['UCBBandit'] = UCBBandit(num_arm=n_articles, alpha=0.5)

#algorithms['ThompsonSamplingGaussianMAB'] = ThompsonSamplingGaussianMAB(num_arm=n_articles)

#algorithms['LinearUCBBandit'] = LinearUCBBandit(dimension=context_dimension, lambda_=0.1, alpha=0.5) #delta=0.05, alpha=2.358

#algorithms['LinearThompsonSamplingMAB'] = LinearThompsonSamplingMAB(dimension=context_dimension, lambda_=0.1)

```

After selecting your algorithm, run the `Simulation.py` script.

## 1. Explore-then-Commit

### Result



  

| Hyperparameter (m) | Cumulative Regret |

|:------------------:|:-----------------:|

|         10         |      1001.40      |

|         20         |       214.90      |

|         30         |       334.02      |







(a) m = 10                                        

(b) m = 20                                        

(c) m = 30

**Figure 1**: Explore then Commit accumulated regret



## 2. Upper Confidence Bound (UCB)

### Reward Estimation + Confidence Bound

$$

\text{UCB} = \hat u_{t-1,i} + \sqrt{\frac{2 \ln t}{S_{t-1,i}}}

$$

### Result



  

| Hyperparameter (α) | Cumulative Regret |

|:------------------:|:-----------------:|

| 0.1                | 256.50            |

| 0.5                | 977.03            |

| 1.0                | 1906.65           |







(a) $\alpha$ = 0.1                                        

(b) $\alpha$ = 0.5                                        

(c) $\alpha$ = 1

**Figure 2**: UCB Bandit accumulated regret



## 3. Thompson Sampling

### Posterior Distribution

$$

N \sim \left( \hat u_{t-1,i}, \frac{1}{S_{t-1,i} + 1} \right)

$$

### Result



| Cumulative Regret |

|:------------------:|

|  100              |



**Figure 3**: Thompson Sampling accumulated regret



## 4. Linear UCB (LinUCB)

### Parameter Estimation

$$

\hat \theta_{t+1} = A^{-1}_ {t+1} b_{t+1}

$$

### Reward Estimation + Confidence Bound

$$

\text{UCB} = x^T \hat \theta_t + \alpha \sqrt{x^T A^{-1} x}

$$

### Result



| Hyperparameter (α) | Cumulative Regret |

|:------------------:|:-----------------:|

| 0.5                | 24.43             |

| 1.5                | 177.89            |

| 2.5                | 487.73            |







(a) $\alpha$ = 0.5                                        

(b) $\alpha$ = 1.5                                        

(c) $\alpha$ = 2.5

**Figure 4**: Linear UCB accumulated regret










(a) $\alpha$ = 0.5                                        

(b) $\alpha$ = 1.5                                        

(c) $\alpha$ = 2.5

**Figure 5**: Linear UCB estimation error



## 5. Linear Thompson Sampling (LinTS)

### Posterior Distribution

$$

N \sim (\hat{\theta}_t, A^{-1}_t)

$$

### Result



| Cumulative Regret |

|:------------------:|

| 1098.24            |





**Figure 6**: Linear Thompson Sampling accumulated regret and estimation error



## 6. Generalized Linear Model (GLM) Bandit: Non-linear Bandit

### Modified Non-LinearReward Function For Testing

$$

R = (x^T \theta)^2 + \epsilon, \text{ where } \epsilon \sim N(\mu, \sigma^2)

$$

### GLM Parameter Estimation (MLE)

$$

\hat \theta_{t+1} = \max_{\theta} P(r|\theta) = A^{-1}_ {t+1} b_{t+1}

$$

### GLM UCB

$$

UCB_{GLM} = f(x^T \hat \theta_t) + \alpha \sqrt{x^T A^{-1} x} = (x^T \hat \theta_t)^2 + \alpha \sqrt{x^T A^{-1} x}

$$

### Result



| Hyperparameter (α) | Cumulative Regret |

|:------------------:|:-----------------:|

| 0.1                | 62.16             |

| 0.5                | 727.63            |

| 1.5                | 5948.48           |







(a) $\alpha$ = 0.1                                        

(b) $\alpha$ = 0.5                                        

(c) $\alpha$ = 1.5

**Figure 7**: GLM-UCB accumulated regret










(a) $\alpha$ = 0.1                                        

(b) $\alpha$ = 0.5                                        

(c) $\alpha$ = 1.5

**Figure 8**: GLM-UCB estimation error

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kapshaul/onlinelearning

Awesome Lists containing this project

README