https://github.com/rl-tools/rl-tools

A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
https://github.com/rl-tools/rl-tools
continuous-control cpp deep-learning mujoco reinforcement-learning robotics tinyml tinyrl
Last synced: 1 day ago
JSON representation
A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
Host: GitHub
URL: https://github.com/rl-tools/rl-tools
Owner: rl-tools
License: mit
Created: 2023-11-11T02:29:10.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-05-23T00:41:18.000Z (over 1 year ago)
Last Synced: 2024-05-23T00:55:16.033Z (over 1 year ago)
Topics: continuous-control, cpp, deep-learning, mujoco, reinforcement-learning, robotics, tinyml, tinyrl
Language: C++
Homepage: https://rl.tools
Size: 4.46 MB
Stars: 135
Watchers: 8
Forks: 5
Open Issues: 4
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

awesome-deep-rl - RLtools - The fastest deep reinforcement learning library for continuous control, implemented in pure, dependency-free C++ (Python bindings available as well). (Libraries)
README

          


  

RLtools: The Fastest Deep Reinforcement Learning Library





  





  Paper on arXiv | Live demo (browser) | Documentation | Zoo | Studio

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  



  

  













    Trained on a 2020 MacBook Pro (M1) using RLtools SAC and TD3 (respectively)





  

      

  

  

      

  





    Trained on a 2020 MacBook Pro (M1) using RLtools PPO/Multi-Agent PPO





  

      

  





    Trained in 18s on a 2020 MacBook Pro (M1) using RLtools TD3



## Benchmarks



  







    Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)





  





    Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)





  





    Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).



## Quick Start

Clone this repo, then build a Zoo example:

```

g++ -std=c++17 -Ofast -I include src/rl/zoo/l2f/sac.cpp

```

Run it `./a.out 1337` (number = seed) then run `python3 -m http.server` to visualize the results. Open `http://localhost:8000` and navigate to the ExTrack UI to watch the quadrotor flying. 

- **macOS**: Append `-framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE` for fast training (~4s on M3)

- **Ubuntu**: Use `apt install libopenblas-dev` and append `-lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS` (~6s on Zen 5).

## Algorithms

| Algorithm | Example                                                                                                                                                                                                                                                                                |

|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| **TD3**   | [Pendulum](./src/rl/environments/pendulum/td3/cpu/standalone.cpp), [Racing Car](./src/rl/environments/car/car.cpp), [MuJoCo Ant-v4](./src/rl/environments/mujoco/ant/td3/training.h), [Acrobot](./src/rl/environments/acrobot/td3/acrobot.cpp)                                         |

| **PPO**   | [Pendulum](./src/rl/environments/pendulum/ppo/cpu/training.cpp), [Racing Car](./src/rl/environments/car/training_ppo.h), [MuJoCo Ant-v4 (CPU)](./src/rl/environments/mujoco/ant/ppo/cpu/training.h), [MuJoCo Ant-v4 (CUDA)](./src/rl/environments/mujoco/ant/ppo/cuda/training_ppo.cu) |

| **Multi-Agent PPO**   | [Bottleneck](./src/rl/zoo/bottleneck-v0/ppo.h) |

| **SAC**   | [Pendulum (CPU)](./src/rl/environments/pendulum/sac/cpu/training.cpp), [Pendulum (CUDA)](./src/rl/environments/pendulum/sac/cuda/sac.cu), [Acrobot](./src/rl/environments/acrobot/sac/acrobot.cpp)                                                                                     |

## Projects Based on RLtools

- Learning to Fly in Seconds: [GitHub](https://github.com/arplaboratory/learning-to-fly) / [arXiv](https://arxiv.org/abs/2311.13081) / [YouTube](https://youtu.be/NRD43ZA1D-4) / [IEEE Spectrum](https://spectrum.ieee.org/amp/drone-quadrotor-2667196800)

- Data-Driven System Identification of Quadrotors Subject to Motor Delays [GitHub](https://github.com/arplaboratory/data-driven-system-identification) / [arXiv](https://arxiv.org/abs/2404.07837) / [YouTube](https://youtu.be/G3WGthRx2KE) / [Project Page](https://sysid.tools)

# Getting Started

> **⚠️ Note**: Check out [Getting Started](https://docs.rl.tools/getting_started.html) in the documentation for a more thorough guide

Simple example on how to implement your own environment and train a policy using PPO:

Clone and checkout:

```

git clone https://github.com/rl-tools/example

cd example

git submodule update --init external/rl_tools

```

build and run:

```

mkdir build

cd build

cmake .. -DCMAKE_BUILD_TYPE=Release

cmake --build .

./my_pendulum

```

Note this example does not have dependencies and should work on any system with CMake and a C++ 17 compiler.

# Documentation

The documentation is available at [docs.rl.tools](https://docs.rl.tools) and consists of C++ notebooks. You can also run them locally to tinker around:

```

docker run -p 8888:8888 rltools/documentation

```

After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering!

| Chapter                                                                                 | Interactive Notebook                                                                                                                                                                                                                                                                               |

|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| [Overview                              ](https://docs.rl.tools/overview.html)                 | -                                                                                                                                                                                                                                                                                                  |

| [Getting Started                       ](https://docs.rl.tools/getting_started.html)                 | -                                                                                                                                                                                                                                                                                           |

| [Containers                            ](https://docs.rl.tools/01-Containers.html)            | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb)                                                                                                                                                             | 

| [Multiple Dispatch                     ](https://docs.rl.tools/02-Multiple%20Dispatch.html)   | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=02-Multiple%20Dispatch.ipynb)                                                                                                                                                    | 

| [Deep Learning                         ](https://docs.rl.tools/03-Deep%20Learning.html)       | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=03-Deep%20Learning.ipynb)                                                                                                                                                        | 

| [CPU Acceleration                      ](https://docs.rl.tools/04-CPU%20Acceleration.html)    | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=04-CPU%20Acceleration.ipynb)                                                                                                                                                     | 

| [MNIST Classification                  ](https://docs.rl.tools/05-MNIST%20Classification.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=05-MNIST%20Classification.ipynb)                                                                                                                                                | 

| [Deep Reinforcement Learning           ](https://docs.rl.tools/06-Deep%20Reinforcement%20Learning.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=06-Deep%20Reinforcement%20Learning.ipynb)                                                                                                                                       | 

| [The Loop Interface                    ](https://docs.rl.tools/07-The%20Loop%20Interface.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=07-The%20Loop%20Interface.ipynb)                                                                                                                                                | 

| [Custom Environment                    ](https://docs.rl.tools/08-Custom%20Environment.html)  | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=08-Custom%20Environment.ipynb)                                                                                                                                                   | 

| [Python Interface                      ](https://docs.rl.tools/09-Python%20Interface.html)                | [![Run Example on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb) | 

[//]: # (## Content)

[//]: # (- [Getting Started](#getting-started))

[//]: # (  - [Cloning the Repository](#cloning-the-repository))

[//]: # (  - [Docker](#docker))

[//]: # (  - [Native](#native))

[//]: # (    - [Unix (Linux and macOS)](#unix-linux-and-macos))

[//]: # (    - [Windows](#windows))

[//]: # (- [Embedded Platforms](#embedded-platforms))

[//]: # (- [Naming Convention](#naming-convention))

[//]: # (- [Citing](#citing))

# Repository Structure

To build the examples from source (either in Docker or natively), first the repository should be cloned.

Instead of cloning all submodules using `git clone --recursive` which takes a lot of space and bandwidth we recommend cloning the main repo containing all the standalone code for RLtools and then cloning the required sets of submodules later:

```

git clone https://github.com/rl-tools/rl-tools.git rl_tools

```

#### Cloning submodules

There are three classes of submodules:

1. External dependencies (in `external/`)

   * E.g. HDF5 for checkpointing, Tensorboard for logging, or MuJoCo for the simulation of contact dynamics

2. Examples/Code for embedded platforms (in `embedded_platforms/`)

3. Redistributable dependencies (in `redistributable/`)

4. Test dependencies (in `tests/lib`)

4. Test data (in `tests/data`)

These sets of submodules can be cloned incrementally/independent of each other.

For most use-cases (like e.g. most of the Docker examples) you should clone the submodules for external dependencies:

```

cd rl_tools

```

```

git submodule update --init --recursive -- external

```

The submodules for the embedded platforms, the redistributable binaries and test dependencies/data can be cloned in the same fashion (by replacing `external` with the appropriate folder from the enumeration above). 

Note: Make sure that for the redistributable dependencies and test data `git-lfs` is installed (e.g. `sudo apt install git-lfs` on Ubuntu) and activated (`git lfs install`) otherwise only the metadata of the blobs is downloaded.

### Python Interface

We provide Python bindings that available as `rltools` through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native RLtools environments.

```

pip install rltools gymnasium

```

Usage:

```

from rltools import SAC

import gymnasium as gym

from gymnasium.wrappers import RescaleAction

seed = 0xf00d

def env_factory():

    env = gym.make("Pendulum-v1")

    env = RescaleAction(env, -1, 1)

    env.reset(seed=seed)

    return env

sac = SAC(env_factory)

state = sac.State(seed)

finished = False

while not finished:

    finished = state.step()

```

You can find more details in the [Python Interface documentation](https://docs.rl.tools/09-Python%20Interface.html) and from the repository [rl-tools/python-interface](https://github.com/rl-tools/python-interface).

## Embedded Platforms

### Inference & Training

- [iOS](https://github.com/rl-tools/ios)

- [teensy](./embedded_platforms)

### Inference

- [Crazyflie](embedded_platforms/crazyflie)

- [ESP32](embedded_platforms)

- [PX4](embedded_platforms)

## Naming Convention

We use `snake_case` for variables/instances, functions as well as namespaces and `PascalCase` for structs/classes. Furthermore, we use upper case `SNAKE_CASE` for compile-time constants. 

## Citing

When using RLtools in an academic work please cite our publication using the following Bibtex citation:

```

@article{eschmann_rltools_2024,

  author  = {Jonas Eschmann and Dario Albani and Giuseppe Loianno},

  title   = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control},

  journal = {Journal of Machine Learning Research},

  year    = {2024},

  volume  = {25},

  number  = {301},

  pages   = {1--19},

  url     = {http://jmlr.org/papers/v25/24-0248.html}

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rl-tools/rl-tools

Awesome Lists containing this project

README

RLtools: The Fastest Deep Reinforcement Learning Library