Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/antares0982/mab-algorithm-template

Code template for multi-armed bandit algorithm
https://github.com/antares0982/mab-algorithm-template

machine-leaning multi-armed-bandit reinforcement-learning static-learn-programming static-learning

Last synced: 8 days ago
JSON representation

Code template for multi-armed bandit algorithm

Host: GitHub
URL: https://github.com/antares0982/mab-algorithm-template
Owner: Antares0982
License: mit
Created: 2021-08-13T08:01:10.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-03-20T06:00:44.000Z (almost 2 years ago)
Last Synced: 2024-08-10T07:52:23.812Z (5 months ago)
Topics: machine-leaning, multi-armed-bandit, reinforcement-learning, static-learn-programming, static-learning
Language: Python
Homepage:
Size: 139 KB
Stars: 5
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # MAB-algorithm

![image](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)![image](https://img.shields.io/badge/SciPy-654FF0?style=for-the-badge&logo=SciPy&logoColor=white)![image](https://img.shields.io/badge/Numpy-777BB4?style=for-the-badge&logo=numpy&logoColor=white)![image](https://img.shields.io/github/license/Antares0982/MAB-algorithm-template)

[Code template for Multi-Armed Bandit algorithms](https://github.com/Antares0982/MAB-algorithm-template)

### MAB model

In probability theory and machine learning, the **multi-armed bandit problem** is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. 

### How To Use

Requiring python3.

```python

pip3 install MAB-algorithm

```

### A Brief Introduction

##### Arms

An MAB model includes a list of arms. This template has defined several arms with different distributions, for example, to define 10 arms with truncated normal distribution:

```python

from MAB_algorithm import *

armlist = [TruncNormArm(name=i, mu=i, sigma=1) for i in range(10)]

```

The information of pre-defined arms and how to can be found in the arms' docstring.

This template has also defined an easy-to-use heavy-tailed distribution, which is considered frequently in MAB problems.

The class `armList` has some useful functions for an arm list.

To define a new type of arm, inherit base class `Arm`. You need to implement these methods:

```python

import numpy as np

from MAB_algorithm import *

class userDefinedArm(Arm):

    def __init__(self, name, *args, **kwargs):

        # Change the initialization parameters for your own purpose, but keep `name` aside.

        super().__init__(name)

        # Implements goes here.

        ...

    def _get_rewards(self, size: int) -> np.ndarray:

        """Return the list of rewards drawn from user defined distribution

        when the arm was played `size` times"""

        # Implements goes here.

        ...

    def optimal_rewards(self) -> float:

        """Return the mean of the reward distribution."""

        # Implements goes here.

        ...

   	def variance(self) -> float:

        """Return the variance of the reward distribution. 

        If you don't need variance, just ignore this."""

        # Implements goes here.

        ...

```

Note: for how to define a customized distribution, see [Python: how to define customized distributions?](https://stackoverflow.com/questions/46055690/python-how-to-define-customized-distributions)

##### Algorithms

An MAB algorithm is a strategy to choose arm in order to gain best rewards. To define an algorithm, inherit `MABAlgorithm` and implement `select_arm`. The needed information is listed in the docstring.

```python

import numpy as np

from MAB_algorithm import *

class SimpleMAB(MABAlgorithm):

    def select_arm(self, count: int, *args, **kwargs) -> int:

        if count < len(self._arms):

            return count

        return np.argmax(self.mean) # play the one with best average reward history

```

It is easy to implement your algorithm by extending the methods pre-defined. Always feel free to use the pre-defined attributes. `_reward_sum[i]` denotes the sum of reward of arm `i`, `_counts[i]` denotes the number of times that arm is played, `optimal_strategy_rewards` denotes the expected reward of the ideal case that arm distribtutions are known (i.e. always choose the arm with best reward), `collected_rewards` denotes the collected rewards, `expected_rewards` denotes the expected total reward under this strategy, and `optimal_arm_rewards` denotes the mean of reward distribution for the best arm.

The pre-defined `mean` attribute is arthmetic mean of reward history, and it can be modified to your own mean estimator, which can be used in your `select_arm`.

```python

import numpy as np

from MAB_algorithm import *

class MyAlgorithm(MABAlgorithm):

    """

    Use `mean[i]+(t**0.25)/(s[i]**0.25)` as mean estimator,

    where `t` is time step, and `s[i]` is the number of times arm `i` was played.

    Note: this algorithm may have linear order of regret for heavy-tailed

    arm when max order of finite moment is 4.

    """

    @property

    def mean(self) -> List[float]:

        mean = super().mean

        total = sum(self._counts)

        for i in range(len(self._arms)):

            mean[i] += np.power(total, 0.25) / np.power(self._counts[i], 0.25)

        return mean

    def select_arm(self, count: int, *args, **kwargs) -> int:

        if count < len(self._arms):

            return count

        return np.argmax(self.mean)

```

Also, pre-defined `regret` is the regret under the current strategy.

For how to define a complicated algorithm, please see the implementation of `robustUCB` in the source code. If you want to change the base logic of MAB algorithm, override `_update_current_states`, `_update_rewards_info`, `_after_draw`, and if you want more information from output, override `_simulation_result`. This template also provides functions for Newton method to find root.

##### Simulations

To start your simulation, use method `run_simulation` for iterator mode:

```python

from MAB_algorithm import *

armlist = [TruncNormArm(name=i, mu=i, sigma=1) for i in range(10)]

dsee = DSEE(arms=armlist, w=10)

dseegenerator = dsee.run_simulation(10000) # T=10000

i = 0

for ans in dseegenerator:

    # `ans` will be a dict containing the information of algorithm at time step i.

    # Iterator is suggested, since simulation will be time-consuming,

    # and one can store every information needed at each step using iterator.

    print(ans)

    i += 1

```

Or, use `run_simulation_tolist` to get all information directly across the input time steps.

```python

from MAB_algorithm import *

armlist = [TruncNormArm(name=i, mu=i, sigma=1) for i in range(10)]

dsee = DSEE(arms=armlist, w=10)

allinfo = dsee.run_simulation_tolist(10000) # T=10000.

# This step is time-consuming and may give no log during the whole time.

# `allinfo` will be a list of 10000 dicts.

```

To start again the algorithm with another (or, the same) parameters, define `restart` properly to clean the info stored in the algorithm object.

##### Run Monte Carlo experiments

To run a monte-carlo experiment, first pass algorithm class and all data needed by your algorithm to `MAB_MonteCarlo`. The monte-carlo experiments can use multi-processing to boost the speed. (This feature may be changed later for better performance.)

```python

from MAB_algorithm import *

armlist = [heavyTailArm(name=i, maxMomentOrder=2, mean=i/10, mainbound=1.5) for i in range(10)]

mctest = MAB_MonteCarlo(DSEE, arms=armlist, w=10) # `w` is the initial parameter needed by DSEE

gen = mctest.run_monte_carlo_to_list(repeatTimes=1000, iterations=10000, useCores=8)

# Use 8 processes, for 1000 independent expeiment of time step 10000.

i = 0

for ans in gen:

    # `ans` is also a dict containing necessary info like average reward, regret,

    # while information of each experiment is dropped.

    # If every detail is needed, pass `needDetails=True` to `run_monte_carlo_to_list`.

    print(ans)

    i += 1

```

##### MAB with multi-players

> To be implemented

### Contribute

To contribute to this repo (e.g. add benchmark algorithm, arm with other distribution), please create pull request, or create [issue](https://github.com/Antares0982/MAB-algorithm-template/issues) to tell me what you need.