https://github.com/hartikainen/information-theoretic-bandit

bandit-learning information-theory information-to-go k-armed-bandit multi-arm-bandits perception-action-cycle reinforcement-learning value-to-go

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/hartikainen/information-theoretic-bandit
Owner: hartikainen
Archived: true
Created: 2017-07-27T04:58:31.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-08-01T06:15:05.000Z (almost 8 years ago)
Last Synced: 2025-02-09T12:17:20.598Z (5 months ago)
Topics: bandit-learning, information-theory, information-to-go, k-armed-bandit, multi-arm-bandits, perception-action-cycle, reinforcement-learning, value-to-go
Language: Python
Homepage:
Size: 10.7 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# information-theoretic-bandit
This repository currently implements the testbed for multi-arm bandit problem presented in Chapter 2 of [1].

## To run
To run a single bandit experiment:
```
usage: testbed.py [-h] [-v VERBOSE] [--num_runs NUM_RUNS]
[--num_arms NUM_ARMS] [--arms_mean ARMS_MEAN]
[--arms_mean_params [ARMS_MEAN_PARAMS [ARMS_MEAN_PARAMS ...]]]
[--arms_std ARMS_STD]
[--arms_std_params [ARMS_STD_PARAMS [ARMS_STD_PARAMS ...]]]
[--agent_class AGENT_CLASS] [--num-episodes NUM_EPISODES]
[--results-file RESULTS_FILE] [--epsilon EPSILON]

K-armed bandit testbed

optional arguments:
-h, --help show this help message and exit
-v VERBOSE, --verbose VERBOSE
Verbose
--num_runs NUM_RUNS Number runs for the experiment
--num_arms NUM_ARMS, -k NUM_ARMS
Number arms for the bandit
--arms_mean ARMS_MEAN
Distribution to select mean for each arm from. Should
correspond to a distribution in numpy.random module.
Use --arms_mean_params to change the parameters passed
to the distribution. For example: --arms_mean=constant
--arms_mean_params=1 will result in bandit whose all
arms have mean of 1, and --arms_mean_params=normal
--arms_mean_params=[0,1] will result in bandit whose
all arms have mean drawn from numpy.random.normal(0,
1)
--arms_mean_params [ARMS_MEAN_PARAMS [ARMS_MEAN_PARAMS ...]]
params to be passed to the distribution
functionspecified by --arms_mean argument.
--arms_std ARMS_STD Distribution to select standard deviation for each arm
from. See help for --arms_mean for more information
--arms_std_params [ARMS_STD_PARAMS [ARMS_STD_PARAMS ...]]
params to be passed to the distribution
functionspecified by --arms_mean argument.
--agent_class AGENT_CLASS, --agent_cls AGENT_CLASS
Name of the class (defined in agents.py) to be used as
an agent. Defaults to agents.DefaultAgent.
--num-episodes NUM_EPISODES
Number of episodes
--results-file RESULTS_FILE
File to write results to
--epsilon EPSILON epsilon for epsilon-greedy exploration
```

For example, running `10-armed bandit`, where arm value function means are taken from normal distribution with mean `μ=0` and standard deviation `σ=1.0`, and greedy-exploration of epsilon=0.1 and writing results in `results/result-1.pickle`:
```
python ./testbed.py --num_arms=10 --arms_mean="normal" --arms_mean_params 0 1 --epsilon=0.1 --results-file="./results/epsilon-0.1.pickle"
```
## To visualize the results
The results recorded by `testbed.py` can be visualized with `vis.py`:
```
usage: vis.py [-h] --results-file RESULTS_FILE

Data visualization for K-armed bandit.

optional arguments:
-h, --help show this help message and exit
--results-file RESULTS_FILE
File to read results from
```
For example, to visualize results from the example above (`./results/epsilon-0.1.pickle`):
```
python ./vis.py --results-file="./results/epsilon-0.1.pickle"
```

[1] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT press.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hartikainen/information-theoretic-bandit

Awesome Lists containing this project

README