https://github.com/yaricom/rl-playground

The RL playground. Experiments with RL algorithms
https://github.com/yaricom/rl-playground

learning-agents openai reinforcement-learning rl-playground

Last synced: 17 days ago
JSON representation

The RL playground. Experiments with RL algorithms

Host: GitHub
URL: https://github.com/yaricom/rl-playground
Owner: yaricom
License: mit
Created: 2016-05-09T21:00:38.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2016-05-10T14:16:09.000Z (about 9 years ago)
Last Synced: 2025-02-15T04:45:09.182Z (5 months ago)
Topics: learning-agents, openai, reinforcement-learning, rl-playground
Language: Python
Size: 7.81 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # RL-playground

In this repository some of my experiments with Reinforcement Learning algorithms based on [OpenAi Gym ToolKit](https://gym.openai.com)

## Overview

Packages:

- [openai/envs](https://github.com/yaricom/RL-playground/tree/master/openai/envs) the OpenAi Gym compatible environments for evaluation

- [openai/agents](https://github.com/yaricom/RL-playground/tree/master/openai/agents) the learning agents

Environments:

- [NArmedBanditEnv](https://github.com/yaricom/RL-playground/blob/master/openai/envs/classic/narmedbandit.py) - N-armed bandit (stationary, nonstationary)

Learning agents:

- [SampleAverageActionValueAgent](https://github.com/yaricom/RL-playground/blob/master/openai/agents/sampleaverage.py) - the learning agent based on sample-average action-value selection algorithm for both stationary and nonstationary environments

## Usage

```Python

import gym

from openai.agents.sampleaverage import SampleAverageActionValueAgent

def main():

    # load environment

    env = gym.make('10ArmedBanditStationary-v0')

    # setup

    agent = SampleAverageActionValueAgent(num_actions = 10)

    episode_count = 1

    max_steps = 100

    reward = 0

    done = False

    for i in xrange(episode_count):

        ob = env.reset()

        for j in xrange(max_steps):

            action = agent.evaluate(reward, done)

            ob, reward, done, _ = env.step(action)

            if done:

                break

if __name__ == '__main__':

    main()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yaricom/rl-playground

Awesome Lists containing this project

README