Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/lucasalegre/morl-baselines

Multi-Objective Reinforcement Learning algorithms implementations.
https://github.com/lucasalegre/morl-baselines
gym gymnasium mo-gymnasium morl multi-objective multi-objective-rl pytorch reinforcement-learning rl rl-algorithms
Last synced: 5 days ago
JSON representation
Multi-Objective Reinforcement Learning algorithms implementations.
Host: GitHub
URL: https://github.com/lucasalegre/morl-baselines
Owner: LucasAlegre
License: mit
Created: 2022-07-22T17:26:05.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-12-04T14:54:34.000Z (2 months ago)
Last Synced: 2025-01-25T19:01:37.653Z (12 days ago)
Topics: gym, gymnasium, mo-gymnasium, morl, multi-objective, multi-objective-rl, pytorch, reinforcement-learning, rl, rl-algorithms
Language: Python
Homepage: https://lucasalegre.github.io/morl-baselines
Size: 22.5 MB
Stars: 338
Watchers: 8
Forks: 54
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.bib
Awesome Lists containing this project

README

        [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)

![tests](https://github.com/LucasAlegre/morl-baselines/workflows/Python%20tests/badge.svg)

[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/LucasAlegre/morl-baselines/blob/main/LICENSE)

[![Discord](https://img.shields.io/discord/999693014618362036?label=discord)](https://discord.gg/ygmkfnBvKA)

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)



# MORL-Baselines

MORL-Baselines is a library of Multi-Objective Reinforcement Learning (MORL) algorithms.

This repository aims to contain reliable MORL algorithms implementations in PyTorch.

It strictly follows [MO-Gymnasium](https://github.com/Farama-Foundation/mo-gymnasium) API, which differs from the standard [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) API only in that the environment returns a numpy array as the reward.

For details on multi-objective MDPs (MOMDPs) and other MORL definitions, we suggest reading [A practical guide to multi-objective reinforcement learning and planning](https://link.springer.com/article/10.1007/s10458-022-09552-y). An overview of some techniques used in various MORL algorithms is also provided in [Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework](https://doi.org/10.1613/jair.1.15702).

A tutorial on MO-Gymnasium and MORL-Baselines is also available: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ByjuUp8-CJeh1giPOACqPGiglPxDnlSq?usp=sharing)

## Features

* Single and multi-policy algorithms under both SER and ESR criteria are implemented.

* All algorithms follow the [MO-Gymnasium](https://www.github.com/Farama-Foundation/mo-gymnasium) API.

* Performances are automatically reported in [Weights and Biases](https://wandb.ai/) dashboards.

* Linting and formatting are enforced by pre-commit hooks.

* Code is well documented.

* All algorithms are automatically tested.

* Utility functions are provided e.g. pareto pruning, experience buffers, etc.

* Performances have been tested and reported in a reproducible manner.

* Hyperparameter optimization available.

## Implemented Algorithms

| **Name**                                                                                                                                                             | Single/Multi-policy | ESR/SER                     | Observation space | Action space          | Paper                                                                                                                                                    |

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|-----------------------------|-------------------|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|

| [GPI-LS + GPI-PD](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/gpi_pd/gpi_pd.py)                                              | Multi               | SER                         | Continuous        | Discrete / Continuous | [Paper and Supplementary Materials](https://arxiv.org/abs/2301.07784)                                                                                    |

| [MORL/D](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/morld/morld.py)                                                         | Multi               | /                           | /                 | /                     | [Paper](https://doi.org/10.1613/jair.1.15702)                                                                                                                |

| [Envelope Q-Learning](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/envelope/envelope.py)                                      | Multi               | SER                         | Continuous        | Discrete              | [Paper](https://arxiv.org/pdf/1908.08342.pdf)                                                                                                            |

| [CAPQL](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/capql/capql.py)                                                          | Multi               | SER                         | Continuous        | Continuous            | [Paper](https://openreview.net/pdf?id=TjEzIsyEsQ6)                                                                                                       |

| [PGMORL](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/pgmorl/pgmorl.py) ^[1](#f1)                                   | Multi               | SER                         | Continuous        | Continuous            | [Paper](https://people.csail.mit.edu/jiex/papers/PGMORL/paper.pdf) / [Supplementary Materials](https://people.csail.mit.edu/jiex/papers/PGMORL/supp.pdf) |

| [Pareto Conditioned Networks (PCN)](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/pcn/pcn.py)                                  | Multi               | SER/ESR ^[2](#f2) | Continuous        | Discrete / Continuous | [Paper](https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p1110.pdf)                                                                                    |

| [Pareto Q-Learning](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/pareto_q_learning/pql.py)                                    | Multi               | SER                         | Discrete          | Discrete              | [Paper](https://jmlr.org/papers/volume15/vanmoffaert14a/vanmoffaert14a.pdf)                                                                              |

| [MO Q learning](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/single_policy/ser/mo_q_learning.py)                                           | Single              | SER                         | Discrete          | Discrete              | [Paper](https://www.researchgate.net/publication/235698665_Scalarized_Multi-Objective_Reinforcement_Learning_Novel_Design_Techniques)                    |

| [MPMOQLearning](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/multi_policy_moqlearning/mp_mo_q_learning.py)  (outer loop MOQL) | Multi               | SER                         | Discrete          | Discrete              | [Paper](https://www.researchgate.net/publication/235698665_Scalarized_Multi-Objective_Reinforcement_Learning_Novel_Design_Techniques)                    |

| [Optimistic Linear Support (OLS)](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/multi_policy/ols/ols.py)                                    | Multi               | SER                         | /                 | /                     | Section 3.3 of the [thesis](http://roijers.info/pub/thesis.pdf)                                                                                          |

| [Expected Utility Policy Gradient (EUPG)](https://github.com/LucasAlegre/morl-baselines/blob/main/morl_baselines/single_policy/esr/eupg.py)                          | Single              | ESR                         | Discrete          | Discrete              | [Paper](https://www.researchgate.net/publication/328718263_Multi-objective_Reinforcement_Learning_for_the_Expected_Utility_of_the_Return)                |

:warning: Some of the algorithms have limited features.

1: Currently, PGMORL is limited to environments with 2 objectives.

2: PCN assumes environments with deterministic transitions.

## Benchmarking

MORL-Baselines participates to [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark) which contains tracked experiments from popular RL libraries such as [cleanRL](https://github.com/vwxyzjn/cleanrl) and [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3).

We have run experiments of our algorithms on various environments from [MO-Gymnasium](https://www.github.com/Farama-Foundation/mo-gymnasium). The results can be found here: https://wandb.ai/openrlbenchmark/MORL-Baselines. An issue tracking all the settings is available at [#43](https://github.com/LucasAlegre/morl-baselines/issues/43). Some design documentation for the experimentation protocol are also available on our [Documentation website](https://lucasalegre.github.io/morl-baselines/algos/performances/).

An example visualization of our dashboards with Pareto support is shown below:



## Structure

As much as possible, this repo tries to follow the single-file implementation rule for all algorithms. The repo's structure is as follows:

* `examples/` contains a set of examples to use MORL Baselines with [MO-Gymnasium](https://www.github.com/Farama-Foundation/mo-gymnasium) environments.

* `common/` contains the implementation recurring concepts: replay buffers, neural nets, etc. See the [documentation](https://lucasalegre.github.io/morl-baselines/) for more details.

* `multi_policy/` contains the implementations of multi-policy algorithms.

* `single_policy/` contains the implementations of single-policy algorithms (ESR and SER).

## Citing the Project

If you use MORL-Baselines in your research, please cite our [NeurIPS 2023 paper](https://openreview.net/pdf?id=jfwRLudQyj):

```bibtex

@inproceedings{felten_toolkit_2023,

	author = {Felten, Florian and Alegre, Lucas N. and Now{\'e}, Ann and Bazzan, Ana L. C. and Talbi, El Ghazali and Danoy, Gr{\'e}goire and Silva, Bruno Castro da},

	title = {A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning},

	booktitle = {Proceedings of the 37th Conference on Neural Information Processing Systems ({NeurIPS} 2023)},

	year = {2023}

}

```

## Maintainers

MORL-Baselines is currently maintained by [Florian Felten](https://ffelten.github.io/) (@ffelten) and [Lucas N. Alegre](https://www.inf.ufrgs.br/~lnalegre/) (@LucasAlegre).

## Contributing

This repository is open to contributions and we are always happy to receive new algorithms, bug fixes, or features. If you want to contribute, you can join our [Discord server](https://discord.gg/ygmkfnBvKA) and discuss your ideas with us. You can also open an issue or a pull request directly.

## Acknowledgements

* Willem Röpke, for his implementation of Pareto Q-Learning (@wilrop)

* Mathieu Reymond, for providing us with the original implementation of PCN.

* Denis Steckelmacher and Conor F. Hayes, for providing us with the original implementation of EUPG.