https://github.com/div99/xql

Extreme Q-Learning: Max Entropy RL without Entropy
https://github.com/div99/xql

deep-learning energy-based-model gumbel-distribution offline-rl reinforcement-learning

Last synced: over 1 year ago
JSON representation

Extreme Q-Learning: Max Entropy RL without Entropy

Host: GitHub
URL: https://github.com/div99/xql
Owner: Div99
Created: 2023-01-10T06:06:10.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-02-14T20:28:12.000Z (over 3 years ago)
Last Synced: 2025-03-26T15:42:28.446Z (over 1 year ago)
Topics: deep-learning, energy-based-model, gumbel-distribution, offline-rl, reinforcement-learning
Language: Python
Homepage: https://div99.github.io/XQL/
Size: 46.8 MB
Stars: 85
Watchers: 2
Forks: 10
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Extreme Q-Learning (X-QL) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Div99/XQL/blob/main/Gumbel_Regression.ipynb)

### [**[Project Page](https://div99.github.io/XQL)**] 

Official code base for **[Extreme Q-Learning: MaxEnt RL without Entropy](https://arxiv.org/abs/2301.02328)** by [Div Garg](https://divyanshgarg.com/)\*, [Joey Hejna](https://jhejna.github.io)\*, [Mattheiu Geist](https://scholar.google.com/citations?user=ectPLEUAAAAJ&hl=en), and [Stefano Ermon](https://cs.stanford.edu/~ermon/).

(*Equal Contribution)

This repo contains code for two novel methods: **Gumbel Regression** and **Extreme Q-learning (X-QL)** formulated in our paper. 

**Gumbel Regression** is a novel method that enables accurate and unbiased estimates of the Partition function over a distribution using simple gradient descent.

**Extreme Q-learning (X-QL)** is an novel & simple RL algorithm for Q-learning that models the maximal soft-values (LogSumExp) without needing to sample from a policy. It directly estimates the optimal Bellman operator B* in continuous action spaces, successfully extending Q-iteration to continuous settings.

It obtains state-of-art results on Offline RL benchmarks such as D4RL, and can improve existing Online RL methods like SAC and TD3. It combines Max Entropy, Conservative & Implicit RL in a single framework.

# Introduction





	






Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from Economics. By doing so, we avoid computing Q-values using out-of-distribution actions which is often a substantial source of error. Our key insight is to introduce an objective that directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy. 



Using EVT, we derive our **Extreme Q-Learning (XQL)** framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms,  ***that do not explicitly require access to a policy or its entropy.*** Our method obtains consistently strong performance in the D4RL benchmark, outperforming prior works by **10+ points** on some tasks while offering moderate improvements over SAC and TD3 on online DM Control tasks.

### Citation

```

@article{

	garg2022extreme,

	title={Extreme Q-Learning: MaxEnt Reinforcement Learning Without Entropy},

	url = {https://arxiv.org/abs/2301.02328},

  	author = {Garg, Divyansh and Hejna, Joey and Geist, Matthieu and Ermon, Stefano},

	publisher = {arXiv},

  	year = {2023},

	}

```

## Key Advantages

✅  Directly models V* in continuous action spaces \(Continuous Q-iteration\)   \

✅  Implict, no OOD Sampling or actor-critic formulation \

✅  Conservative with respect to the behavior policy \

✅  Improves performance on the D4RL benchmark versus similar approaches

## Usage

For exploring Gumbel Regression, you can play with the [Gumbel Regression notebook](https://github.com/Div99/XQL/blob/main/Gumbel_Regression.ipynb) in Google Colab. 

This repository is divided into two subparts, one for the offline RL and one for the online RL experiments.

To install and use X-QL check the instructions provided in the [Offline folder](offline) for running Offline RL and [Online folder](online) for running Online RL.

## Questions

Please feel free to email us if you have any questions. 

Div Garg ([divgarg@stanford.edu](mailto:divgarg@stanford.edu?subject=[GitHub]%X-QL)), Joey Hejna([jhejna@stanford.edu](mailto:jhejna@stanford.edu?subject=[GitHub]%X-QL))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/div99/xql

Awesome Lists containing this project

README