Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yaricom/rl-playground
The RL playground. Experiments with RL algorithms
https://github.com/yaricom/rl-playground
learning-agents openai reinforcement-learning rl-playground
Last synced: 11 days ago
JSON representation
The RL playground. Experiments with RL algorithms
- Host: GitHub
- URL: https://github.com/yaricom/rl-playground
- Owner: yaricom
- License: mit
- Created: 2016-05-09T21:00:38.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-05-10T14:16:09.000Z (over 8 years ago)
- Last Synced: 2024-11-05T15:51:49.357Z (about 2 months ago)
- Topics: learning-agents, openai, reinforcement-learning, rl-playground
- Language: Python
- Size: 7.81 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RL-playground
In this repository some of my experiments with Reinforcement Learning algorithms based on [OpenAi Gym ToolKit](https://gym.openai.com)## Overview
Packages:
- [openai/envs](https://github.com/yaricom/RL-playground/tree/master/openai/envs) the OpenAi Gym compatible environments for evaluation
- [openai/agents](https://github.com/yaricom/RL-playground/tree/master/openai/agents) the learning agentsEnvironments:
- [NArmedBanditEnv](https://github.com/yaricom/RL-playground/blob/master/openai/envs/classic/narmedbandit.py) - N-armed bandit (stationary, nonstationary)Learning agents:
- [SampleAverageActionValueAgent](https://github.com/yaricom/RL-playground/blob/master/openai/agents/sampleaverage.py) - the learning agent based on sample-average action-value selection algorithm for both stationary and nonstationary environments## Usage
```Python
import gymfrom openai.agents.sampleaverage import SampleAverageActionValueAgent
def main():
# load environment
env = gym.make('10ArmedBanditStationary-v0')# setup
agent = SampleAverageActionValueAgent(num_actions = 10)
episode_count = 1
max_steps = 100
reward = 0
done = Falsefor i in xrange(episode_count):
ob = env.reset()for j in xrange(max_steps):
action = agent.evaluate(reward, done)
ob, reward, done, _ = env.step(action)
if done:
breakif __name__ == '__main__':
main()
```