To prepare RLtools for mixed-precision training we introduce a type policy numeric_types::Policy. Until now it was easy to switch the floating point type in RLtools because everything depended on the T parameter. For modern deep leraning this is not sufficient because we would like to configure different types for different parts of the models / algorithms (e.g. bf16 for parameters, fp32 for gradient/optimizer state). To facilitate this it is not sufficient to pass around a single type parameter.

Hence we created numeric_types::Policy to enable flexible type configuration:

using namespace rlt::numeric_types;
using PARAMETER_TYPE_RULE = UseCase<categories::Parameter, float>;
using GRADIENT_TYPE_RULE = UseCase<categories::Gradient, float>;
using TYPE_POLICY = Policy<double, PARAMETER_TYPE_RULE, GRADIENT_TYPE_RULE>;

The TYPE_POLICY is then passed instead of T, e.g.:

using MODEL_CONFIG = rlt::nn_models::mlp::Configuration<TYPE_POLICY, TI, OUTPUT_DIM, NUM_LAYERS, HIDDEN_DIM, ACTIVATION_FUNCTION, ACTIVATION_FUNCTION_OUTPUT>;

In the codebase the TYPE_POLICY is then queried as follows:

using PARAMETER_TYPE = TYPE_POLICY::template GET<categories::Parameter>;
using GRADIENT_TYPE = TYPE_POLICY::template GET<categories::Gradient>;
using GRADIENT_TYPE = TYPE_POLICY::template GET<categories::Optimizer>;

This allows for a very flexible configuration. If a tag is not set (like categories::Optimizer in this case), it will fall back to TYPE_POLICY::DEFAULT which is double in this case (the first argument). TYPE_POLICY::DEFAULT is also the type that should be used for configuration variables and other variables that do not clearly fall under the categories. You can also define custom category tags yourself, easily. More about that will be covered in a section of the documentation at https://docs.rl.tools in the future.

This is a small API change but it appears in many places, so we implemented it ASAP (without mixed precision training itself being implemented, yet) such that there will be less confusion in the future where we expect these kinds of API to be more stable.

Currently, the advice is to just create a TYPE_POLICY = rlt::numeric_types::Policy<{float,double}>;' and pass it everywhere. You might encounter errors when trying to access e.g. some SPEC::Twhich you should be able to replace withSPEC::TYPE_POLICY::DEFAULTfor identical behavior. In general the behavior should be exactly identical as long as you configure the same float type you used forT` before.

Paper on arXiv | Live demo (browser) | Documentation | Zoo | Studio

Trained on a 2020 MacBook Pro (M1) using RLtools SAC and TD3 (respectively)

Trained on a 2020 MacBook Pro (M1) using RLtools PPO/Multi-Agent PPO

Trained in 18s on a 2020 MacBook Pro (M1) using RLtools TD3

Benchmarks

Benchmarks of training the Pendulum swing-up using different RL libraries (PPO and SAC respectively)

Benchmarks of training the Pendulum swing-up on different devices (SAC, RLtools)

Benchmarks of the inference frequency for a two-layer [64, 64] fully-connected neural network across different microcontrollers (types and architectures).

Quick Start

Clone this repo, then build a Zoo example:

g++ -std=c++17 -O3 -ffast-math -I include src/rl/zoo/l2f/sac.cpp

Run it ./a.out 1337 (number = seed) then run ./tools/serve.sh to visualize the results. Open http://localhost:8000 and navigate to the ExTrack UI to watch the quadrotor flying.

macOS: Append -framework Accelerate -DRL_TOOLS_BACKEND_ENABLE_ACCELERATE for fast training (~4s on M3)
Ubuntu: Use apt install libopenblas-dev and append -lopenblas -DRL_TOOLS_BACKEND_ENABLE_OPENBLAS (~6s on Zen 5).

Algorithms

Algorithm	Example
TD3	Pendulum, Racing Car, MuJoCo Ant-v4, Acrobot
PPO	Pendulum, Racing Car, MuJoCo Ant-v4 (CPU), MuJoCo Ant-v4 (CUDA)
Multi-Agent PPO	Bottleneck
SAC	Pendulum (CPU), Pendulum (CUDA), Acrobot

Projects Based on RLtools

Learning to Fly in Seconds: GitHub / arXiv / YouTube / IEEE Spectrum
Data-Driven System Identification of Quadrotors Subject to Motor Delays GitHub / arXiv / YouTube / Project Page

Getting Started

⚠️ Note: Check out Getting Started in the documentation for a more thorough guide

To get started implementing your own environment please refer to rl-tools/example

Documentation

The documentation is available at docs.rl.tools and consists of C++ notebooks. You can also run them locally to tinker around:

docker run -p 8888:8888 rltools/documentation

After running the Docker container, open the link that is displayed in the CLI (http://127.0.0.1:8888/...) in your browser and enjoy tinkering!

Chapter	Interactive Notebook
Overview	-
Getting Started	-
Containers
Multiple Dispatch
Deep Learning
CPU Acceleration
MNIST Classification
Deep Reinforcement Learning
The Loop Interface
Custom Environment
Python Interface

Python Interface

We provide Python bindings that available as rltools through PyPI (the pip package index). Note that using Python Gym environments can slow down the trianing significantly compared to native RLtools environments.

pip install rltools gymnasium

Usage:

from rltools import SAC
import gymnasium as gym
from gymnasium.wrappers import RescaleAction

seed = 0xf00d
def env_factory():
    env = gym.make("Pendulum-v1")
    env = RescaleAction(env, -1, 1)
    env.reset(seed=seed)
    return env

sac = SAC(env_factory)
state = sac.State(seed)

finished = False
while not finished:
    finished = state.step()

You can find more details in the Python Interface documentation and from the repository rl-tools/python-interface.

Embedded Platforms

Inference & Training

iOS
teensy

Inference

Naming Convention

We use snake_case for variables/instances, functions as well as namespaces and PascalCase for structs/classes. Furthermore, we use upper case SNAKE_CASE for compile-time constants.

Citing

When using RLtools in an academic work please cite our publication using the following Bibtex citation:

@article{eschmann_rltools_2024,
  author  = {Jonas Eschmann and Dario Albani and Giuseppe Loianno},
  title   = {RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control},
  journal = {Journal of Machine Learning Research},
  year    = {2024},
  volume  = {25},
  number  = {301},
  pages   = {1--19},
  url     = {http://jmlr.org/papers/v25/24-0248.html}
}

Error processing

rl-tools

2.2.0

What's New

v2.2.0

2025-10-23T05:42:48Z