https://github.com/div99/xql
Extreme Q-Learning: Max Entropy RL without Entropy
https://github.com/div99/xql
deep-learning energy-based-model gumbel-distribution offline-rl reinforcement-learning
Last synced: about 1 year ago
JSON representation
Extreme Q-Learning: Max Entropy RL without Entropy
- Host: GitHub
- URL: https://github.com/div99/xql
- Owner: Div99
- Created: 2023-01-10T06:06:10.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-14T20:28:12.000Z (over 3 years ago)
- Last Synced: 2025-03-26T15:42:28.446Z (about 1 year ago)
- Topics: deep-learning, energy-based-model, gumbel-distribution, offline-rl, reinforcement-learning
- Language: Python
- Homepage: https://div99.github.io/XQL/
- Size: 46.8 MB
- Stars: 85
- Watchers: 2
- Forks: 10
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Extreme Q-Learning (X-QL) [](https://colab.research.google.com/github/Div99/XQL/blob/main/Gumbel_Regression.ipynb)
### [**[Project Page](https://div99.github.io/XQL)**]
Official code base for **[Extreme Q-Learning: MaxEnt RL without Entropy](https://arxiv.org/abs/2301.02328)** by [Div Garg](https://divyanshgarg.com/)\*, [Joey Hejna](https://jhejna.github.io)\*, [Mattheiu Geist](https://scholar.google.com/citations?user=ectPLEUAAAAJ&hl=en), and [Stefano Ermon](https://cs.stanford.edu/~ermon/).
(*Equal Contribution)
This repo contains code for two novel methods: **Gumbel Regression** and **Extreme Q-learning (X-QL)** formulated in our paper.
**Gumbel Regression** is a novel method that enables accurate and unbiased estimates of the Partition function over a distribution using simple gradient descent.
**Extreme Q-learning (X-QL)** is an novel & simple RL algorithm for Q-learning that models the maximal soft-values (LogSumExp) without needing to sample from a policy. It directly estimates the optimal Bellman operator B* in continuous action spaces, successfully extending Q-iteration to continuous settings.
It obtains state-of-art results on Offline RL benchmarks such as D4RL, and can improve existing Online RL methods like SAC and TD3. It combines Max Entropy, Conservative & Implicit RL in a single framework.
# Introduction
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from Economics. By doing so, we avoid computing Q-values using out-of-distribution actions which is often a substantial source of error. Our key insight is to introduce an objective that directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy.
Using EVT, we derive our **Extreme Q-Learning (XQL)** framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, ***that do not explicitly require access to a policy or its entropy.*** Our method obtains consistently strong performance in the D4RL benchmark, outperforming prior works by **10+ points** on some tasks while offering moderate improvements over SAC and TD3 on online DM Control tasks.
### Citation
```
@article{
garg2022extreme,
title={Extreme Q-Learning: MaxEnt Reinforcement Learning Without Entropy},
url = {https://arxiv.org/abs/2301.02328},
author = {Garg, Divyansh and Hejna, Joey and Geist, Matthieu and Ermon, Stefano},
publisher = {arXiv},
year = {2023},
}
```
## Key Advantages
✅ Directly models V* in continuous action spaces \(Continuous Q-iteration\) \
✅ Implict, no OOD Sampling or actor-critic formulation \
✅ Conservative with respect to the behavior policy \
✅ Improves performance on the D4RL benchmark versus similar approaches
## Usage
For exploring Gumbel Regression, you can play with the [Gumbel Regression notebook](https://github.com/Div99/XQL/blob/main/Gumbel_Regression.ipynb) in Google Colab.
This repository is divided into two subparts, one for the offline RL and one for the online RL experiments.
To install and use X-QL check the instructions provided in the [Offline folder](offline) for running Offline RL and [Online folder](online) for running Online RL.
## Questions
Please feel free to email us if you have any questions.
Div Garg ([divgarg@stanford.edu](mailto:divgarg@stanford.edu?subject=[GitHub]%X-QL)), Joey Hejna([jhejna@stanford.edu](mailto:jhejna@stanford.edu?subject=[GitHub]%X-QL))