{"id":13528428,"url":"https://github.com/rl-tools/rl-tools","last_synced_at":"2025-11-03T19:30:36.757Z","repository":{"id":206613976,"uuid":"717287064","full_name":"rl-tools/rl-tools","owner":"rl-tools","description":"A Fast, Portable Deep Reinforcement Learning Library for Continuous Control","archived":false,"fork":false,"pushed_at":"2024-05-23T00:41:18.000Z","size":4672,"stargazers_count":135,"open_issues_count":4,"forks_count":5,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-05-23T00:55:16.033Z","etag":null,"topics":["continuous-control","cpp","deep-learning","mujoco","reinforcement-learning","robotics","tinyml","tinyrl"],"latest_commit_sha":null,"homepage":"https://rl.tools","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rl-tools.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-11T02:29:10.000Z","updated_at":"2024-05-27T18:53:59.627Z","dependencies_parsed_at":null,"dependency_job_id":"d01ded0f-788a-4a1d-8468-a4dfba2f5316","html_url":"https://github.com/rl-tools/rl-tools","commit_stats":{"total_commits":1529,"total_committers":3,"mean_commits":509.6666666666667,"dds":0.03662524525833877,"last_synced_commit":"a5e2a3a300c021d59865ea40e8a990709c87e2e4"},"previous_names":["rl-tools/rl-tools"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rl-tools%2Frl-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rl-tools%2Frl-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rl-tools%2Frl-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rl-tools%2Frl-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rl-tools","download_url":"https://codeload.github.com/rl-tools/rl-tools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222728216,"owners_count":17029678,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["continuous-control","cpp","deep-learning","mujoco","reinforcement-learning","robotics","tinyml","tinyrl"],"created_at":"2024-08-01T07:00:19.037Z","updated_at":"2025-11-03T19:30:36.699Z","avatar_url":"https://github.com/rl-tools.png","language":"C++","readme":"\u003cdiv align=\"center\"\u003e\n  \u003ccenter\u003e\u003ch1\u003e\u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e: The Fastest Deep Reinforcement Learning Library\u003c/h1\u003e\u003c/center\u003e\n\u003c/div\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/overview.jpg\"/ width=500\u003e  \n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2306.03530\"\u003ePaper on arXiv\u003c/a\u003e | \u003ca href=\"https://rl.tools\"\u003eLive demo (browser)\u003c/a\u003e | \u003ca href=\"https://docs.rl.tools\"\u003eDocumentation\u003c/a\u003e | \u003ca href=\"https://zoo.rl.tools\"\u003eZoo\u003c/a\u003e | \u003ca href=\"https://studio.rl.tools\"\u003eStudio\u003c/a\u003e\n  \u003c/br\u003e\n\u003c/br\u003e\n  \u003ca href=\"https://github.com/rl-tools/rl-tools/actions/workflows/tests-backend.yml\"\u003e\n  \u003cimg src=\"https://github.com/rl-tools/rl-tools/actions/workflows/tests-backend.yml/badge.svg\" alt=\"Documentation\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/rl-tools/rl-tools\" \u003e\n  \u003cimg src=\"https://codecov.io/gh/rl-tools/rl-tools/graph/badge.svg?token=3TJZ635O8V\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://docs.rl.tools\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Documentation-Read%20the%20Docs-blue.svg\" alt=\"Documentation\"\u003e\n  \u003c/a\u003e\n\u003c/br\u003e\n  \u003ca href=\"https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb\"\u003e\n  \u003cimg src=\"https://mybinder.org/badge_logo.svg\" alt=\"Run tutorials on Binder\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb\"\u003e\n  \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Run Example on Colab\"\u003e\n  \u003c/a\u003e\n\u003c/br\u003e\n\u003ca href=\"https://discord.gg/kbvxCavb5h\"\u003e\n  \u003cimg src=\"https://img.shields.io/discord/1194228521216778250?label=Discord\u0026logo=discord\u0026logoColor=white\u0026color=7289da\" alt=\"Join our Discord!\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/acrobot-swing-up-sac.gif\" alt=\"animated\" height='200'/\u003e\n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/racing_car.gif\" alt=\"animated\" height='200'/\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n    Trained on a 2020 MacBook Pro (M1) using \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e SAC and TD3 (respectively)\n\u003c/div\u003e\n\u003c/br\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/rl-tools/rl-tools/blob/master/src/rl/environments/mujoco/ant/ppo/cpu/training.h\"\u003e\n    \u003cimg src=\"https://github.com/rl-tools/media/blob/master/rl_tools_mujoco_ant_ppo.gif\" alt=\"animated\" height='200'/\u003e  \n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/rl-tools/rl-tools/blob/e033cc1d739f66d18ef685233d8dd84dddb3fe69/src/rl/zoo/ppo/bottleneck-v0.h\"\u003e\n    \u003cimg src=\"https://github.com/rl-tools/media/blob/master/bottleneck.gif\" alt=\"animated\" height='200'/\u003e  \n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    Trained on a 2020 MacBook Pro (M1) using \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e PPO/Multi-Agent PPO\n\u003c/div\u003e\n\u003c/br\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/arplaboratory/learning-to-fly\"\u003e\n    \u003cimg src=\"https://github.com/rl-tools/media/blob/master/learning-to-fly-in-seconds.gif\" alt=\"animated\" width='350'/\u003e  \n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    Trained in 18s on a 2020 MacBook Pro (M1) using \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e TD3\n\u003c/div\u003e\n\u003c/br\u003e\n\n\n## Benchmarks\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/benchmark_horizontal_ppo.png\"/ width=300\u003e  \n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/benchmark_horizontal_sac.png\"/ width=300\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n    Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)\n\u003c/div\u003e\n\u003c/br\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/benchmark_vertical.png\"/ width=350\u003e  \n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n    Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)\n\u003c/div\u003e\n\n\u003c/br\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/rl-tools/media/blob/master/microcontroller_inference.png\"/ width=600\u003e  \n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n    Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).\n\u003c/div\u003e\n\n## Quick Start\nClone this repo, then build a Zoo example:\n```\ng++ -std=c++17 -Ofast -I include src/rl/zoo/l2f/sac.cpp\n```\nRun it `./a.out 1337` (number = seed) then run `python3 -m http.server` to visualize the results. Open `http://localhost:8000` and navigate to the ExTrack UI to watch the quadrotor flying. \n\n- **macOS**: Append `-framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE` for fast training (~4s on M3)\n- **Ubuntu**: Use `apt install libopenblas-dev` and append `-lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS` (~6s on Zen 5).\n\n## Algorithms\n| Algorithm | Example                                                                                                                                                                                                                                                                                |\n|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **TD3**   | [Pendulum](./src/rl/environments/pendulum/td3/cpu/standalone.cpp), [Racing Car](./src/rl/environments/car/car.cpp), [MuJoCo Ant-v4](./src/rl/environments/mujoco/ant/td3/training.h), [Acrobot](./src/rl/environments/acrobot/td3/acrobot.cpp)                                         |\n| **PPO**   | [Pendulum](./src/rl/environments/pendulum/ppo/cpu/training.cpp), [Racing Car](./src/rl/environments/car/training_ppo.h), [MuJoCo Ant-v4 (CPU)](./src/rl/environments/mujoco/ant/ppo/cpu/training.h), [MuJoCo Ant-v4 (CUDA)](./src/rl/environments/mujoco/ant/ppo/cuda/training_ppo.cu) |\n| **Multi-Agent PPO**   | [Bottleneck](./src/rl/zoo/bottleneck-v0/ppo.h) |\n| **SAC**   | [Pendulum (CPU)](./src/rl/environments/pendulum/sac/cpu/training.cpp), [Pendulum (CUDA)](./src/rl/environments/pendulum/sac/cuda/sac.cu), [Acrobot](./src/rl/environments/acrobot/sac/acrobot.cpp)                                                                                     |\n\n## Projects Based on \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e\n- Learning to Fly in Seconds: [GitHub](https://github.com/arplaboratory/learning-to-fly) / [arXiv](https://arxiv.org/abs/2311.13081) / [YouTube](https://youtu.be/NRD43ZA1D-4) / [IEEE Spectrum](https://spectrum.ieee.org/amp/drone-quadrotor-2667196800)\n- Data-Driven System Identification of Quadrotors Subject to Motor Delays [GitHub](https://github.com/arplaboratory/data-driven-system-identification) / [arXiv](https://arxiv.org/abs/2404.07837) / [YouTube](https://youtu.be/G3WGthRx2KE) / [Project Page](https://sysid.tools)\n\n\n# Getting Started\n\u003e **⚠️ Note**: Check out [Getting Started](https://docs.rl.tools/getting_started.html) in the documentation for a more thorough guide\n\nSimple example on how to implement your own environment and train a policy using PPO:\n\nClone and checkout:\n```\ngit clone https://github.com/rl-tools/example\ncd example\ngit submodule update --init external/rl_tools\n```\nbuild and run:\n```\nmkdir build\ncd build\ncmake .. -DCMAKE_BUILD_TYPE=Release\ncmake --build .\n./my_pendulum\n```\n\nNote this example does not have dependencies and should work on any system with CMake and a C++ 17 compiler.\n\n# Documentation\nThe documentation is available at [docs.rl.tools](https://docs.rl.tools) and consists of C++ notebooks. You can also run them locally to tinker around:\n\n```\ndocker run -p 8888:8888 rltools/documentation\n```\nAfter running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering!\n\n| Chapter                                                                                 | Interactive Notebook                                                                                                                                                                                                                                                                               |\n|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [Overview                              ](https://docs.rl.tools/overview.html)                 | -                                                                                                                                                                                                                                                                                                  |\n| [Getting Started                       ](https://docs.rl.tools/getting_started.html)                 | -                                                                                                                                                                                                                                                                                           |\n| [Containers                            ](https://docs.rl.tools/01-Containers.html)            | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb)                                                                                                                                                             | \n| [Multiple Dispatch                     ](https://docs.rl.tools/02-Multiple%20Dispatch.html)   | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=02-Multiple%20Dispatch.ipynb)                                                                                                                                                    | \n| [Deep Learning                         ](https://docs.rl.tools/03-Deep%20Learning.html)       | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=03-Deep%20Learning.ipynb)                                                                                                                                                        | \n| [CPU Acceleration                      ](https://docs.rl.tools/04-CPU%20Acceleration.html)    | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=04-CPU%20Acceleration.ipynb)                                                                                                                                                     | \n| [MNIST Classification                  ](https://docs.rl.tools/05-MNIST%20Classification.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=05-MNIST%20Classification.ipynb)                                                                                                                                                | \n| [Deep Reinforcement Learning           ](https://docs.rl.tools/06-Deep%20Reinforcement%20Learning.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=06-Deep%20Reinforcement%20Learning.ipynb)                                                                                                                                       | \n| [The Loop Interface                    ](https://docs.rl.tools/07-The%20Loop%20Interface.html) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=07-The%20Loop%20Interface.ipynb)                                                                                                                                                | \n| [Custom Environment                    ](https://docs.rl.tools/08-Custom%20Environment.html)  | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=08-Custom%20Environment.ipynb)                                                                                                                                                   | \n| [Python Interface                      ](https://docs.rl.tools/09-Python%20Interface.html)                | [![Run Example on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb) | \n\n\n[//]: # (## Content)\n\n[//]: # (- [Getting Started]\u0026#40;#getting-started\u0026#41;)\n\n[//]: # (  - [Cloning the Repository]\u0026#40;#cloning-the-repository\u0026#41;)\n\n[//]: # (  - [Docker]\u0026#40;#docker\u0026#41;)\n\n[//]: # (  - [Native]\u0026#40;#native\u0026#41;)\n\n[//]: # (    - [Unix \u0026#40;Linux and macOS\u0026#41;]\u0026#40;#unix-linux-and-macos\u0026#41;)\n\n[//]: # (    - [Windows]\u0026#40;#windows\u0026#41;)\n\n[//]: # (- [Embedded Platforms]\u0026#40;#embedded-platforms\u0026#41;)\n\n[//]: # (- [Naming Convention]\u0026#40;#naming-convention\u0026#41;)\n\n[//]: # (- [Citing]\u0026#40;#citing\u0026#41;)\n\n# Repository Structure\nTo build the examples from source (either in Docker or natively), first the repository should be cloned.\nInstead of cloning all submodules using `git clone --recursive` which takes a lot of space and bandwidth we recommend cloning the main repo containing all the standalone code for \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e and then cloning the required sets of submodules later:\n```\ngit clone https://github.com/rl-tools/rl-tools.git rl_tools\n```\n#### Cloning submodules\nThere are three classes of submodules:\n1. External dependencies (in `external/`)\n   * E.g. HDF5 for checkpointing, Tensorboard for logging, or MuJoCo for the simulation of contact dynamics\n2. Examples/Code for embedded platforms (in `embedded_platforms/`)\n3. Redistributable dependencies (in `redistributable/`)\n4. Test dependencies (in `tests/lib`)\n4. Test data (in `tests/data`)\n\nThese sets of submodules can be cloned incrementally/independent of each other.\nFor most use-cases (like e.g. most of the Docker examples) you should clone the submodules for external dependencies:\n```\ncd rl_tools\n```\n```\ngit submodule update --init --recursive -- external\n```\n\nThe submodules for the embedded platforms, the redistributable binaries and test dependencies/data can be cloned in the same fashion (by replacing `external` with the appropriate folder from the enumeration above). \nNote: Make sure that for the redistributable dependencies and test data `git-lfs` is installed (e.g. `sudo apt install git-lfs` on Ubuntu) and activated (`git lfs install`) otherwise only the metadata of the blobs is downloaded.\n\n### Python Interface\n\nWe provide Python bindings that available as `rltools` through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e environments.\n```\npip install rltools gymnasium\n```\nUsage:\n```\nfrom rltools import SAC\nimport gymnasium as gym\nfrom gymnasium.wrappers import RescaleAction\n\nseed = 0xf00d\ndef env_factory():\n    env = gym.make(\"Pendulum-v1\")\n    env = RescaleAction(env, -1, 1)\n    env.reset(seed=seed)\n    return env\n\nsac = SAC(env_factory)\nstate = sac.State(seed)\n\nfinished = False\nwhile not finished:\n    finished = state.step()\n```\nYou can find more details in the [Python Interface documentation](https://docs.rl.tools/09-Python%20Interface.html) and from the repository [rl-tools/python-interface](https://github.com/rl-tools/python-interface).\n\n## Embedded Platforms\n### Inference \u0026 Training\n- [iOS](https://github.com/rl-tools/ios)\n- [teensy](./embedded_platforms)\n### Inference\n- [Crazyflie](embedded_platforms/crazyflie)\n- [ESP32](embedded_platforms)\n- [PX4](embedded_platforms)\n\n## Naming Convention\nWe use `snake_case` for variables/instances, functions as well as namespaces and `PascalCase` for structs/classes. Furthermore, we use upper case `SNAKE_CASE` for compile-time constants. \n\n## Citing\nWhen using \u003cspan style=\"color:#7DB9B6\"\u003eRLtools\u003c/span\u003e in an academic work please cite our publication using the following Bibtex citation:\n```\n@article{eschmann_rltools_2024,\n  author  = {Jonas Eschmann and Dario Albani and Giuseppe Loianno},\n  title   = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control},\n  journal = {Journal of Machine Learning Research},\n  year    = {2024},\n  volume  = {25},\n  number  = {301},\n  pages   = {1--19},\n  url     = {http://jmlr.org/papers/v25/24-0248.html}\n}\n```\n","funding_links":[],"categories":["Libraries","C++"],"sub_categories":["[Tools](#tools-1)"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frl-tools%2Frl-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frl-tools%2Frl-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frl-tools%2Frl-tools/lists"}