https://github.com/rl-tools/rl-tools
A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
https://github.com/rl-tools/rl-tools
continuous-control cpp deep-learning mujoco reinforcement-learning robotics tinyml tinyrl
Last synced: 6 months ago
JSON representation
A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
- Host: GitHub
- URL: https://github.com/rl-tools/rl-tools
- Owner: rl-tools
- License: mit
- Created: 2023-11-11T02:29:10.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-05-23T00:41:18.000Z (over 1 year ago)
- Last Synced: 2024-05-23T00:55:16.033Z (over 1 year ago)
- Topics: continuous-control, cpp, deep-learning, mujoco, reinforcement-learning, robotics, tinyml, tinyrl
- Language: C++
- Homepage: https://rl.tools
- Size: 4.46 MB
- Stars: 135
- Watchers: 8
- Forks: 5
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-deep-rl - RLtools - The fastest deep reinforcement learning library for continuous control, implemented in pure, dependency-free C++ (Python bindings available as well). (Libraries)
README
RLtools: The Fastest Deep Reinforcement Learning Library![]()
Paper on arXiv | Live demo (browser) | Documentation | Zoo | Studio
![]()
![]()
Trained on a 2020 MacBook Pro (M1) using RLtools SAC and TD3 (respectively)
Trained on a 2020 MacBook Pro (M1) using RLtools PPO/Multi-Agent PPO
Trained in 18s on a 2020 MacBook Pro (M1) using RLtools TD3## Benchmarks
![]()
![]()
Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)![]()
Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)![]()
Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).## Quick Start
Clone this repo, then build a Zoo example:
```
g++ -std=c++17 -Ofast -I include src/rl/zoo/l2f/sac.cpp
```
Run it `./a.out 1337` (number = seed) then run `python3 -m http.server` to visualize the results. Open `http://localhost:8000` and navigate to the ExTrack UI to watch the quadrotor flying.- **macOS**: Append `-framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE` for fast training (~4s on M3)
- **Ubuntu**: Use `apt install libopenblas-dev` and append `-lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS` (~6s on Zen 5).## Algorithms
| Algorithm | Example |
|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **TD3** | [Pendulum](./src/rl/environments/pendulum/td3/cpu/standalone.cpp), [Racing Car](./src/rl/environments/car/car.cpp), [MuJoCo Ant-v4](./src/rl/environments/mujoco/ant/td3/training.h), [Acrobot](./src/rl/environments/acrobot/td3/acrobot.cpp) |
| **PPO** | [Pendulum](./src/rl/environments/pendulum/ppo/cpu/training.cpp), [Racing Car](./src/rl/environments/car/training_ppo.h), [MuJoCo Ant-v4 (CPU)](./src/rl/environments/mujoco/ant/ppo/cpu/training.h), [MuJoCo Ant-v4 (CUDA)](./src/rl/environments/mujoco/ant/ppo/cuda/training_ppo.cu) |
| **Multi-Agent PPO** | [Bottleneck](./src/rl/zoo/bottleneck-v0/ppo.h) |
| **SAC** | [Pendulum (CPU)](./src/rl/environments/pendulum/sac/cpu/training.cpp), [Pendulum (CUDA)](./src/rl/environments/pendulum/sac/cuda/sac.cu), [Acrobot](./src/rl/environments/acrobot/sac/acrobot.cpp) |## Projects Based on RLtools
- Learning to Fly in Seconds: [GitHub](https://github.com/arplaboratory/learning-to-fly) / [arXiv](https://arxiv.org/abs/2311.13081) / [YouTube](https://youtu.be/NRD43ZA1D-4) / [IEEE Spectrum](https://spectrum.ieee.org/amp/drone-quadrotor-2667196800)
- Data-Driven System Identification of Quadrotors Subject to Motor Delays [GitHub](https://github.com/arplaboratory/data-driven-system-identification) / [arXiv](https://arxiv.org/abs/2404.07837) / [YouTube](https://youtu.be/G3WGthRx2KE) / [Project Page](https://sysid.tools)# Getting Started
> **⚠️ Note**: Check out [Getting Started](https://docs.rl.tools/getting_started.html) in the documentation for a more thorough guideSimple example on how to implement your own environment and train a policy using PPO:
Clone and checkout:
```
git clone https://github.com/rl-tools/example
cd example
git submodule update --init external/rl_tools
```
build and run:
```
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .
./my_pendulum
```Note this example does not have dependencies and should work on any system with CMake and a C++ 17 compiler.
# Documentation
The documentation is available at [docs.rl.tools](https://docs.rl.tools) and consists of C++ notebooks. You can also run them locally to tinker around:```
docker run -p 8888:8888 rltools/documentation
```
After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering!| Chapter | Interactive Notebook |
|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Overview ](https://docs.rl.tools/overview.html) | - |
| [Getting Started ](https://docs.rl.tools/getting_started.html) | - |
| [Containers ](https://docs.rl.tools/01-Containers.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb) |
| [Multiple Dispatch ](https://docs.rl.tools/02-Multiple%20Dispatch.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=02-Multiple%20Dispatch.ipynb) |
| [Deep Learning ](https://docs.rl.tools/03-Deep%20Learning.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=03-Deep%20Learning.ipynb) |
| [CPU Acceleration ](https://docs.rl.tools/04-CPU%20Acceleration.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=04-CPU%20Acceleration.ipynb) |
| [MNIST Classification ](https://docs.rl.tools/05-MNIST%20Classification.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=05-MNIST%20Classification.ipynb) |
| [Deep Reinforcement Learning ](https://docs.rl.tools/06-Deep%20Reinforcement%20Learning.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=06-Deep%20Reinforcement%20Learning.ipynb) |
| [The Loop Interface ](https://docs.rl.tools/07-The%20Loop%20Interface.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=07-The%20Loop%20Interface.ipynb) |
| [Custom Environment ](https://docs.rl.tools/08-Custom%20Environment.html) | [](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=08-Custom%20Environment.ipynb) |
| [Python Interface ](https://docs.rl.tools/09-Python%20Interface.html) | [](https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb) |[//]: # (## Content)
[//]: # (- [Getting Started](#getting-started))
[//]: # ( - [Cloning the Repository](#cloning-the-repository))
[//]: # ( - [Docker](#docker))
[//]: # ( - [Native](#native))
[//]: # ( - [Unix (Linux and macOS)](#unix-linux-and-macos))
[//]: # ( - [Windows](#windows))
[//]: # (- [Embedded Platforms](#embedded-platforms))
[//]: # (- [Naming Convention](#naming-convention))
[//]: # (- [Citing](#citing))
# Repository Structure
To build the examples from source (either in Docker or natively), first the repository should be cloned.
Instead of cloning all submodules using `git clone --recursive` which takes a lot of space and bandwidth we recommend cloning the main repo containing all the standalone code for RLtools and then cloning the required sets of submodules later:
```
git clone https://github.com/rl-tools/rl-tools.git rl_tools
```
#### Cloning submodules
There are three classes of submodules:
1. External dependencies (in `external/`)
* E.g. HDF5 for checkpointing, Tensorboard for logging, or MuJoCo for the simulation of contact dynamics
2. Examples/Code for embedded platforms (in `embedded_platforms/`)
3. Redistributable dependencies (in `redistributable/`)
4. Test dependencies (in `tests/lib`)
4. Test data (in `tests/data`)These sets of submodules can be cloned incrementally/independent of each other.
For most use-cases (like e.g. most of the Docker examples) you should clone the submodules for external dependencies:
```
cd rl_tools
```
```
git submodule update --init --recursive -- external
```The submodules for the embedded platforms, the redistributable binaries and test dependencies/data can be cloned in the same fashion (by replacing `external` with the appropriate folder from the enumeration above).
Note: Make sure that for the redistributable dependencies and test data `git-lfs` is installed (e.g. `sudo apt install git-lfs` on Ubuntu) and activated (`git lfs install`) otherwise only the metadata of the blobs is downloaded.### Python Interface
We provide Python bindings that available as `rltools` through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native RLtools environments.
```
pip install rltools gymnasium
```
Usage:
```
from rltools import SAC
import gymnasium as gym
from gymnasium.wrappers import RescaleActionseed = 0xf00d
def env_factory():
env = gym.make("Pendulum-v1")
env = RescaleAction(env, -1, 1)
env.reset(seed=seed)
return envsac = SAC(env_factory)
state = sac.State(seed)finished = False
while not finished:
finished = state.step()
```
You can find more details in the [Python Interface documentation](https://docs.rl.tools/09-Python%20Interface.html) and from the repository [rl-tools/python-interface](https://github.com/rl-tools/python-interface).## Embedded Platforms
### Inference & Training
- [iOS](https://github.com/rl-tools/ios)
- [teensy](./embedded_platforms)
### Inference
- [Crazyflie](embedded_platforms/crazyflie)
- [ESP32](embedded_platforms)
- [PX4](embedded_platforms)## Naming Convention
We use `snake_case` for variables/instances, functions as well as namespaces and `PascalCase` for structs/classes. Furthermore, we use upper case `SNAKE_CASE` for compile-time constants.## Citing
When using RLtools in an academic work please cite our publication using the following Bibtex citation:
```
@article{eschmann_rltools_2024,
author = {Jonas Eschmann and Dario Albani and Giuseppe Loianno},
title = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control},
journal = {Journal of Machine Learning Research},
year = {2024},
volume = {25},
number = {301},
pages = {1--19},
url = {http://jmlr.org/papers/v25/24-0248.html}
}
```