Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ThomasLecat/gym-bandit-environments
Multi-armed bandits environments for OpenAI Gym
https://github.com/ThomasLecat/gym-bandit-environments
Last synced: about 1 month ago
JSON representation
Multi-armed bandits environments for OpenAI Gym
- Host: GitHub
- URL: https://github.com/ThomasLecat/gym-bandit-environments
- Owner: ThomasLecat
- License: mit
- Created: 2017-10-20T10:02:03.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-05-27T22:53:33.000Z (about 6 years ago)
- Last Synced: 2024-02-10T05:42:30.830Z (5 months ago)
- Language: Python
- Size: 13.7 KB
- Stars: 10
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-deep-reinforcement-learning - ThomasLecat/gym-bandit-environments
README
# Bandit Environments
Series of n-armed bandit environments for the OpenAI Gym
This code is inspired by Jesse Cooper's work:
https://github.com/JKCooper2/gym-banditsThe environments added in this repository are based on Wang et. al experiments described in the paper Learning to Reinforcement Learn.
https://arxiv.org/abs/1611.05763#### Notes
Each environment uses a different set of:
* Probability Distributions - A list of probabilities of the likelihood that a particular bandit will pay out
* Reward Distributions - A list of either rewards (if number) or means and standard deviations (if list) of the payout that bandit hasE.g. BanditTwoArmedHighLowFixed-v0 has `p_dist=[0.8, 0.2]`, `r_dist=[1, 1]`, meaning 80% of the time that action 0 is
selected it will payout 1, and 20% of the time action 2 is selected it will payout 1You can access the distributions through the p_dist and r_dist variables using `env.p_dist` or `env.r_dist` if you want to match
your weights against the true values for plotting results of various algorithmsTo fit the universe-starter-agent, the observation of the bandits has been modified from 0 (type: gym.spaces.Discrete) to [0] (type: gym.spaces.box.Box).
Some of the environments return pieces of information regarding the arms. For example: the index of the optimal arm or the value of a parameter.
### List of Environments
New in this repository:
* BanditTwoArmedIndependentUniform-v0: The two arms return a reward of 1 with probabilities p1 and p2 ~ U[0,1]
* BanditTwoArmedDependentUniform-v0: The first arm returns a reward of 1 with probability p ~ U[0,1], the second arm with probability 1-p
* BanditTwoArmedDependentEasy-v0: The first arm returns a reward of 1 with probability p ~ U{0.1,0.9}, the second arm with probability 1-p
* BanditTwoArmedDependentMedium-v0: The first arm returns a reward of 1 with probability p ~ U{0.25,0.75}, the second arm with probability 1-p
* BanditTwoArmedDependentHard-v0: The first arm returns a reward of 1 with probability p ~ U{0.4,0.6}, the second arm with probability 1-p
* BanditElevenArmedWithIndex: One optimal arm always returns a reward of 5, the other arms a reward of 1.1 ; The 11th arm return a reward of 0.1*Other environments:
* BanditTwoArmedDeterministicFixed-v0: Simplest case where one bandit always pays, and the other always doesn't
* BanditTwoArmedHighLowFixed-v0: Stochastic version with a large difference between which bandit pays out of two choices
* BanditTwoArmedHighHighFixed-v0: Stochastic version with a small difference between which bandit pays where both are good
* BanditTwoArmedLowLowFixed-v0: Stochastic version with a small difference between which bandit pays where both are bad
* BanditTenArmedRandomFixed-v0: 10 armed bandit with random probabilities assigned to payouts
* BanditTenArmedRandomRandom-v0: 10 armed bandit with random probabilities assigned to both payouts and rewards
* BanditTenArmedUniformDistributedReward-v0: 10 armed bandit with that always pays out with a reward selected from a uniform distribution
* BanditTenArmedGaussian-v0: 10 armed bandit mentioned on page 30 of [Reinforcement Learning: An Introduction](https://www.dropbox.com/s/b3psxv2r0ccmf80/book2015oct.pdf?dl=0) (Sutton and Barto)### Installation
```
git clone https://github.com/ThomasLecat/gym-bandit-environments.git
cd gym-bandits
pip install -e .
```In your gym environment
```
import gym_bandits
env = gym.make("BanditTenArmedGaussian-v0") # Replace with relevant env
```