An open API service indexing awesome lists of open source software.

https://github.com/ethanluoyc/corax

Corax: Core RL in JAX
https://github.com/ethanluoyc/corax

jax machine-learning reinforcement-learning

Last synced: 5 months ago
JSON representation

Corax: Core RL in JAX

Awesome Lists containing this project

README

          

# Corax: Core RL in JAX
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![test](https://github.com/ethanluoyc/corax/actions/workflows/test.yml/badge.svg)](https://github.com/ethanluoyc/corax/actions/workflows/test.yml)

**[Installation](#installation)** |
**[Examples](#examples)** |
**[Agents](#agents)** |
**[Datasets](#datasets)**

Corax is a library for reinforcement learning algorithms in JAX. It aims at providing
modular, pure and functional components for RL algorithms that can be easily used in
different training loops and accelerator configurations. Currently, we are exploring the
design of a LearnerCore and ActorCore design that allows easy composition and scaling of
RL algorithms. At the same time, Corax aims to provide strong baseline agents that can
be forked and customized for future RL research.

Corax starts as a fork of the
[dm-acme](https://github.com/google-deepmind/acme/tree/master) library while aiming to
provide a better experience for researchers working on Online/offline RL in JAX. Future
development of Corax may diverge from the design in Acme.

## Installation
You can install Corax with
```
pip install 'git+https://github.com/ethanluoyc/corax#egg=corax[tf,jax]'
```

To use Corax with GPU, you need to install JAX with GPU support. Follow the instructions
[here](https://jax.readthedocs.io/en/latest/installation.html) for how to install JAX
with GPU support.

### Note on TensorFlow dependency
The base corax package does not depend on specific deep
learning frameworks. However, the JAX agent depends on TensorFlow for efficient
data-processing.

We provide optional extras for installing `tensorflow-cpu` and
compatible versions of [TensorFlow
Probability](https://github.com/tensorflow/probability) and
[Reverb](https://github.com/google-deepmind/reverb/tree/master).

However, should that be
incompatible with your own dependency requirements, you can optionally specify these
dependencies yourself and opt-out our extras. Check out the
[pyproject.toml](./pyproject.toml) for examples on how to specify compatible TensorFlow
versions.
Here is an example workflow of determining the compatible versions of TensorFlow, TensorFlow-Probability and Reverb. Assume that you will use `tensorflow-cpu~=2.13.0`, then
1. Looking at https://github.com/tensorflow/probability/releases,
the tensorflow-probability version compatible with `tensorflow~=2.13.0` is `0.21.0`.
2. Looking at https://github.com/google-deepmind/reverb/tree/master#reverb-releases, the
`dm-reverb` version compatible is `0.12.0`.

Therefore, as an application developer, you should put the following in your `requirements.txt`
```bash
tensorflow-cpu~=2.13.0
tensorflow-probability~=0.21.0
dm-reverb~=0.12.0
```

If you use `dm-launchpad`. The workflow is similar, although as of 17 Oct, 2023 Launchpad
does not provide an official build for `tensorflow 2.13.0`. We however, have an unofficial
manylinux build for Python 3.9 and 3.10 available at
https://github.com/ethanluoyc/launchpad/releases/tag/v0.6.0rc0. If you intend to use
this version, you should include in your `requirements.txt`
```bash
# Use the exact link to the wheel file for your Python version
dm-launchpad @ https://github.com/ethanluoyc/launchpad/releases/download/v0.6.0rc0/dm_launchpad-0.6.0rc0-cp39-cp39-manylinux2014_x86_64.whl
```
We currently do not have build for tensorflow 2.14.0 due to
https://github.com/google-deepmind/launchpad/issues/44.

## Examples
Examples can be found in [projects](projects/).

## Development

```bash
git clone https://github.com/ethanluoyc/jax
cd corax
# Create a virtual environment with the method of your choice.
python3 -m venv .venv
source .venv/bin/activate
# Then run
pip install -e '.[tf,jax,test,dev]'
# Install pre-commit hooks if you intend to create PRs.
pre-commit install
# Install the baselines by running
pip install -r projects/baselines/requirements.txt -e projects/baselines
```

## Agents
Corax includes high-quality implementation of many popular RL agents. These agents are
meant to be forked and customized for future RL research.

The implementation has been used in numerous research projects and we intend to provide
benchmark results for these agents in the future.

Corax currently implements the following agents JAX:

| Agent | Paper | Code |
|----------------------|--------------------------|----------------------------------------------------------------|
| CalQL | [Nakamoto et al., 2023] | [calql](corax/agents/jax/calql/) |
| CQL | [Kumar et al., 2020] | [calql](corax/agents/jax/calql/) |
| IQL | [Kostrikov et al., 2021] | [iql](corax/agents/jax/iql/) |
| RLPD | [Ball et al., 2023] | [redq](corax/agents/jax/redq/) |
| Decision Transformer | [Chen et al., 2021a] | [decision_transformer](corax/agents/jax/decision_transformer/) |
| DrQ-v2(-BC) | [Yarats et al., 2021] | [drq_v2](corax/agents/jax/drq_v2/) |
| ORIL | [Zolna et al., 2020] | [oril](corax/agents/jax/oril/) |
| OTR | [Luo et al., 2023] | [otr](corax/agents/jax/otr/) |
| REDQ | [Chen et al., 2021b] | [redq](corax/agents/jax/redq/) |
| TD3 | [Fujimoto et al., 2018] | [td3](corax/agents/jax/td3/) |
| TD3-BC | [Fujimoto et al., 2021] | [td3](corax/agents/jax/td3/) |
| TD-MPC | [Hansen et al., 2021] | [tdmpc](corax/agents/jax/tdmpc/) |

More agents, including those implemented in
[Magi](https://github.com/ethanluoyc/magi/tree/main/magi) may be added in the future.
Contributions to include new agents are welcome!

## Datasets
For online RL, Corax uses [Reverb](https://github.com/google-deepmind/reverb/) for online RL agents.

When working with offline RL, existing datasets provided by the community may come in
different formats. It can be time-consuming to integrate existing algorithms with
different datasets.

Therefore, for offline RL, Corax provides additional
[TFDS](https://github.com/tensorflow/datasets/tree/master) dataset builders that can
build datasets stored in [RLDS](https://github.com/google-research/rlds) format. This
allows easily running the same offline RL algorithm on offline RL datasets in a
consistent manner. You may want to check out the
[list](https://github.com/google-research/rlds/tree/main#available-datasets) of datasets
officially supported by the TFDS/RLDS team.

In addition to the official RLDS datasets, the following datasets can be built with
Corax:

| Dataset | Paper | Code |
|-----------------|--------------------------|-----------------------------------------------------|
| V-D4RL | [Lu et al., 2023] | [vd4rl](corax/datasets/tfds/vd4rl/) |
| Watch and Match | [Haldar et al., 2022] | [rot](corax/datasets/tfds/rot/) |
| ExoRL | [Yarats et al., 2022] | [exorl](corax/datasets/tfds/exorl/) |
| GWIL | [Fickinger et al., 2022] | [gwil](corax/datasets/tfds/gwil/) |
| Adroit Binary | [Nair et al., 2022] | [adroit_binary](corax/datasets/tfds/adroit_binary/) |

NOTE: Some of these datasets do not yet cover all splits provided by the original
dataset. They will be added as the need arises.

## Acknowledgements
We would like to thank the [Acme](https://github.com/google-deepmind/acme) authors who
have provided a great starting point for Corax. Without them, Corax would not exist as a
significant portion of the current code is forked from them. You should check out Acme
if you are looking for more RL agent implementations.

We would like to thank the authors of the original papers for open-sourcing their code
which has been a great help in our re-implementation.

[Nakamoto et al., 2023]: https://arxiv.org/abs/2303.05479
[Chen et al., 2021a]: https://arxiv.org/abs/2106.01345
[Yarats et al., 2021]: https://arxiv.org/abs/2107.09645
[Zolna et al., 2020]: https://arxiv.org/pdf/2011.13885.pdf
[Kostrikov et al., 2021]: https://openreview.net/forum?id=68n2s9ZJWF8
[Chen et al., 2021b]: https://arxiv.org/abs/2101.05982
[Ball et al., 2023]: https://arxiv.org/abs/2302.02948
[Fujimoto et al., 2018]: https://arxiv.org/abs/1802.09477
[Fujimoto et al., 2021]: https://arxiv.org/abs/2106.06860.pdf
[Hansen et al., 2021]: https://arxiv.org/abs/2203.04955
[Kumar et al., 2020]: https://arxiv.org/abs/2006.04779
[Luo et al., 2023]: https://arxiv.org/abs/2303.13971

[Lu et al., 2023]: https://arxiv.org/abs/1806.06920
[Haldar et al., 2022]: https://openreview.net/forum?id=ZUtgUA0Fuwd
[Yarats et al., 2022]: https://arxiv.org/abs/2201.13425
[Fickinger et al., 2022]: https://arxiv.org/abs/2110.03684
[Nair et al., 2022]: https://arxiv.org/abs/2006.09359