https://github.com/hartikainen/information-theoretic-bandit
https://github.com/hartikainen/information-theoretic-bandit
bandit-learning information-theory information-to-go k-armed-bandit multi-arm-bandits perception-action-cycle reinforcement-learning value-to-go
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/hartikainen/information-theoretic-bandit
- Owner: hartikainen
- Archived: true
- Created: 2017-07-27T04:58:31.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-08-01T06:15:05.000Z (almost 8 years ago)
- Last Synced: 2025-02-09T12:17:20.598Z (3 months ago)
- Topics: bandit-learning, information-theory, information-to-go, k-armed-bandit, multi-arm-bandits, perception-action-cycle, reinforcement-learning, value-to-go
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# information-theoretic-bandit
This repository currently implements the testbed for multi-arm bandit problem presented in Chapter 2 of [1].## To run
To run a single bandit experiment:
```
usage: testbed.py [-h] [-v VERBOSE] [--num_runs NUM_RUNS]
[--num_arms NUM_ARMS] [--arms_mean ARMS_MEAN]
[--arms_mean_params [ARMS_MEAN_PARAMS [ARMS_MEAN_PARAMS ...]]]
[--arms_std ARMS_STD]
[--arms_std_params [ARMS_STD_PARAMS [ARMS_STD_PARAMS ...]]]
[--agent_class AGENT_CLASS] [--num-episodes NUM_EPISODES]
[--results-file RESULTS_FILE] [--epsilon EPSILON]K-armed bandit testbed
optional arguments:
-h, --help show this help message and exit
-v VERBOSE, --verbose VERBOSE
Verbose
--num_runs NUM_RUNS Number runs for the experiment
--num_arms NUM_ARMS, -k NUM_ARMS
Number arms for the bandit
--arms_mean ARMS_MEAN
Distribution to select mean for each arm from. Should
correspond to a distribution in numpy.random module.
Use --arms_mean_params to change the parameters passed
to the distribution. For example: --arms_mean=constant
--arms_mean_params=1 will result in bandit whose all
arms have mean of 1, and --arms_mean_params=normal
--arms_mean_params=[0,1] will result in bandit whose
all arms have mean drawn from numpy.random.normal(0,
1)
--arms_mean_params [ARMS_MEAN_PARAMS [ARMS_MEAN_PARAMS ...]]
params to be passed to the distribution
functionspecified by --arms_mean argument.
--arms_std ARMS_STD Distribution to select standard deviation for each arm
from. See help for --arms_mean for more information
--arms_std_params [ARMS_STD_PARAMS [ARMS_STD_PARAMS ...]]
params to be passed to the distribution
functionspecified by --arms_mean argument.
--agent_class AGENT_CLASS, --agent_cls AGENT_CLASS
Name of the class (defined in agents.py) to be used as
an agent. Defaults to agents.DefaultAgent.
--num-episodes NUM_EPISODES
Number of episodes
--results-file RESULTS_FILE
File to write results to
--epsilon EPSILON epsilon for epsilon-greedy exploration
```For example, running `10-armed bandit`, where arm value function means are taken from normal distribution with mean `μ=0` and standard deviation `σ=1.0`, and greedy-exploration of epsilon=0.1 and writing results in `results/result-1.pickle`:
```
python ./testbed.py --num_arms=10 --arms_mean="normal" --arms_mean_params 0 1 --epsilon=0.1 --results-file="./results/epsilon-0.1.pickle"
```
## To visualize the results
The results recorded by `testbed.py` can be visualized with `vis.py`:
```
usage: vis.py [-h] --results-file RESULTS_FILEData visualization for K-armed bandit.
optional arguments:
-h, --help show this help message and exit
--results-file RESULTS_FILE
File to read results from
```
For example, to visualize results from the example above (`./results/epsilon-0.1.pickle`):
```
python ./vis.py --results-file="./results/epsilon-0.1.pickle"
```[1] Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT press.