An open API service indexing awesome lists of open source software.

https://github.com/sentenai/reinforce

Reinforcement learning in haskell
https://github.com/sentenai/reinforce

algorithms gym gym-environments haskell reinforcement-learning

Last synced: 11 days ago
JSON representation

Reinforcement learning in haskell

Awesome Lists containing this project

README

          

reinforce
=============
![Build Status](https://travis-ci.org/Sentenai/reinforce.svg?branch=master)

`reinforce` is a library which exports an openai-gym-like typeclass, `MonadEnv`, with both an interface to the [`gym-http-api`][gym-http], as well as haskell-native environments which provide a substantial speed-up to the http-server interface.

This is an environment-first library, with basic reinforcement learning algorithms being developed on branches in subpackages (see [#Development and Milestones](#development-and-milestones)).
`reinforce` is currently an "alpha" release since it still needs some work defining some formal structures around what state-spaces and action-spaces should look like, however haskell's typesystem is expressive enough that this seems to be more of a "nice-to-have".

This repo is in active development and has some beginner-friendly contributions, from porting new gym environments to implementing new algorithms. Because this library is not on hackage, if you would like to see the haddocks, [you can find it here](https://sentenai.github.io/reinforce/).

[gym-http]: https://github.com/openai/gym-http-api/

An example agent
=============

In `reinforce-zoo/bandits/examples/`, you can find an agent which showcases some of the functionality of this library.

```haskell
module Main where

import Reinforce.Prelude
-- ^ NoImplicitPrelude is on

import Environments.CartPole (Environment, runEnvironment_)
import Control.MonadEnv (Initial(..), Obs(..))

import qualified Control.MonadEnv as Env (step, reset)
import qualified Environments.CartPole as Env (StateCP)
-- Comments:
-- StateCP - An "observation" or "the state of the agent" - note that State overloaded, so StateCP
-- Action - A performable action in the environment.
import qualified Reinforce.Spaces.Action as Actions (randomChoice)

main :: IO ()
main = runEnvironment_ gogoRandomAgent

where
gogoRandomAgent :: Environment ()
gogoRandomAgent = forM_ [0..maxEpisodes] $ \_ ->
Env.reset >>= \case -- this comes from LambdaCase. Sugar for: \a -> case a of ...
EmptyEpisode -> pure ()
Initial obs -> do
liftIO . print $ "Initialized episode and am in state " ++ show obs
rolloutEpisode obs 0

maxEpisodes :: Int
maxEpisodes = 100

-- this is usually the structure of a rollout:
rolloutEpisode :: Env.StateCP -> Double -> Environment ()
rolloutEpisode obs totalRwd = do
a <- liftIO Actions.randomChoice
Env.step a >>= \case
Terminated -> pure ()
Done r mobs ->
liftIO . print
$ "Done! final reward: " ++ show (totalRwd+r) ++ ", final state: " ++ show mobs
Next r obs' -> do
liftIO . print
$ "Stepped with " ++ show a ++ " - reward: " ++ show r ++ ", next state: " ++ show obs'
rolloutEpisode obs' (totalRwd+r)
```

You can build and run this with the following commands:

```
git clone https://github.com/Sentenai/reinforce
cd reinforce
stack build
stack exec random-agent-example
```

Note that if you want to run a gym environment, you'll have to run the [openai/gym-http-api][gym-http] server with the following steps:

```
git clone https://github.com/openai/gym-http-api
cd gym-http-api
pip install -r requirements.txt
python ./gym_http_server.py
```

Currently, development has been primarily focused around classic control, so if you want to add any of the Atari environments, this would be an easy contribution!

Installing
=============

Reinforce doesn't exist on hackage or stackage (yet), so your best bet is to add this git repo to your stack.yaml file:

```yaml
packages:
- '.'
- location:
git: git@github.com:Sentenai/reinforce.git
commit: 'v0.0.1'
extra-dep:true

# This is a requirement due to some tight coupling of the gym-http-api
- location:
git: https://github.com/stites/gym-http-api.git
commit: '5b72789'
subdirs:
- binding-hs
extra-dep: true
- ...
```

and add it to your cabal file or package.yaml (recommended) dependencies.

Development and Milestones
=============

If you want to contribute, you're in luck! There are a range of things to do from the beginner haskeller to, even, advanced pythonistas!

Please [file an issue mentioning where you'd like to help](https://github.com/Sentenai/reinforce/issues), or track down @stites in the [dataHaskell gitter](https://gitter.im/dataHaskell/) or directly through [keybase.io](https://keybase.io/stites).

While you can check the [Github issues](https://github.com/Sentenai/reinforce/issues), here are some items off the top of my head which could use some immediate attention (and may also need to be filed).

A few quick environment contributions might be the following:
- [#1](https://github.com/Sentenai/reinforce/issues/1) (easy) - Add an Atari environment to the api (like pong! others might require directly commiting to `gym-http-api`)
- [#8](https://github.com/Sentenai/reinforce/issues/8) (med) - Port Richard Sutton's Acrobot code to haskell
- [#6](https://github.com/Sentenai/reinforce/issues/6) (hard) - Break the dependency on the `openai/gym-http-api` server -- this would speed up performance considerably
- [#9](https://github.com/Sentenai/reinforce/issues/9) (harder) - Render the haskell CartPole environment with SDL

Some longer-running algorithmic contributions which would take place on the `algorithms` or `deep-rl` branches might be:
- [#10](https://github.com/Sentenai/reinforce/issues/10) (easy) - Convert algorithms into agents
- [#11](https://github.com/Sentenai/reinforce/issues/11) (med) - Add a testable "convergence" criteria
- [#12](https://github.com/Sentenai/reinforce/issues/12) (med) - Implement some eligibility trace variants to the `algorithms` branch
- [#13](https://github.com/Sentenai/reinforce/issues/13) (med) - Add some policy gradient methods to the `algorithms` branch
- [#14](https://github.com/Sentenai/reinforce/issues/14) (hard) - Head over to the `deep-rl` branch and convert some of the deep reinforcement learning models into haskell with [tensorflow-haskell][tfhs], and/or [backprop][bp]

For a longer-term view, feel free to check out [Milestones](https://github.com/Sentenai/reinforce/milestones).

[tfhs]:https://github.com/tensorflow/haskell
[bp]:https://github.com/mstksg/backprop

Contributors
======================

Thanks goes to these wonderful people ([emoji key](https://github.com/kentcdodds/all-contributors#emoji-key)):

| [
Sam Stites](https://www.stites.io)
[💻](https://github.com/stites/reinforce/commits?author=stites "Code") [🤔](#ideas-stites "Ideas, Planning, & Feedback") [📖](https://github.com/stites/reinforce/commits?author=stites "Documentation") | [
Mitchell Rosen](https://github.com/mitchellwrosen)
[🤔](#ideas-mitchellwrosen "Ideas, Planning, & Feedback") | [
Anastasia Aizman](https://github.com/anastasia)
[📖](https://github.com/stites/reinforce/commits?author=anastasia "Documentation") |
| :---: | :---: | :---: |

This project follows the [all-contributors](https://github.com/kentcdodds/all-contributors) specification. Contributions of any kind welcome!