Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mshuqair/rl-multi-armed-bandit

Reinforcement Learning Multi-Armed Bandit Implementation
https://github.com/mshuqair/rl-multi-armed-bandit

multi-armed-bandit python reinforcement-learning

Last synced: about 1 month ago
JSON representation

Reinforcement Learning Multi-Armed Bandit Implementation

Host: GitHub
URL: https://github.com/mshuqair/rl-multi-armed-bandit
Owner: mshuqair
License: mit
Created: 2024-08-19T14:46:59.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-09-13T19:32:14.000Z (4 months ago)
Last Synced: 2024-09-14T10:29:25.772Z (4 months ago)
Topics: multi-armed-bandit, python, reinforcement-learning
Language: Python
Homepage:
Size: 304 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Multi-Armed Bandit
- Implementation of Multi-Armed Bandits uing epsilon-greedy algorithm in Python.
- This implementation is a modified version of: https://github.com/SahanaRamnath/MultiArmedBandit_RL
- This implementation employs the algorithm for Test bed and Ads Optimization Dataset
- The rest of the documentation is coming soon.

## Introduction
Multi-Armed Bandit (MAB) is a Machine Learning framework in which an agent has to select actions (also called arms) in order to maximize its cumulative reward in the long term. In each round, the agent receives some information about the current state (context), then it chooses an action based on this information and the experience gathered in previous rounds. At the end of each round, the agent receives the reward associated with the chosen action.
In a simple form, the Multi-Armed Bandit problem is as follows: you are faced with k slot machines (i.e., an “k -armed bandit”). When the arm on a machine is pulled, it has some unknown probability of dispensing a unit of reward (e.g., $1). The task is to pull one arm at a time so as to maximize the total rewards accumulated over time (the best payout, while not losing too much money).
Trying each machine once and then choosing the one that paid the most would not be a good strategy: The agent could fall into choosing a machine that had a lucky outcome in the beginning but is suboptimal in general. Instead, the agent should repeatedly come back to choosing machines that do not look so good, in order to collect more information about them. This is the main challenge in Multi-Armed Bandits: the agent has to find the right mixture between exploiting prior knowledge and exploring so as to avoid overlooking the optimal actions.

## Definition

## The 10-Armed Test Bed

## Ads Dataset

## Conclusions

## References

## Code Requirements
The code was run and tested using the following:
- Python 3.10.11
- matplotlib 3.9.0
- seaborn 0.13.2
- numpy 1.26.3