{"id":24518164,"url":"https://github.com/div99/xql","last_synced_at":"2025-04-14T09:43:13.058Z","repository":{"id":78672464,"uuid":"587174423","full_name":"Div99/XQL","owner":"Div99","description":"Extreme Q-Learning: Max Entropy RL without Entropy","archived":false,"fork":false,"pushed_at":"2023-02-14T20:28:12.000Z","size":49072,"stargazers_count":85,"open_issues_count":2,"forks_count":10,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-26T15:42:28.446Z","etag":null,"topics":["deep-learning","energy-based-model","gumbel-distribution","offline-rl","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://div99.github.io/XQL/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Div99.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-01-10T06:06:10.000Z","updated_at":"2025-03-26T10:35:20.000Z","dependencies_parsed_at":"2023-03-15T16:30:48.367Z","dependency_job_id":null,"html_url":"https://github.com/Div99/XQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Div99%2FXQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Div99%2FXQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Div99%2FXQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Div99%2FXQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Div99","download_url":"https://codeload.github.com/Div99/XQL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248856226,"owners_count":21172716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","energy-based-model","gumbel-distribution","offline-rl","reinforcement-learning"],"created_at":"2025-01-22T01:39:24.725Z","updated_at":"2025-04-14T09:43:13.031Z","avatar_url":"https://github.com/Div99.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Extreme Q-Learning (X-QL) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Div99/XQL/blob/main/Gumbel_Regression.ipynb)\n\n### [**[Project Page](https://div99.github.io/XQL)**] \n\n\n\nOfficial code base for **[Extreme Q-Learning: MaxEnt RL without Entropy](https://arxiv.org/abs/2301.02328)** by [Div Garg](https://divyanshgarg.com/)\\*, [Joey Hejna](https://jhejna.github.io)\\*, [Mattheiu Geist](https://scholar.google.com/citations?user=ectPLEUAAAAJ\u0026hl=en), and [Stefano Ermon](https://cs.stanford.edu/~ermon/).\n(*Equal Contribution)\n\nThis repo contains code for two novel methods: **Gumbel Regression** and **Extreme Q-learning (X-QL)** formulated in our paper. \n\n**Gumbel Regression** is a novel method that enables accurate and unbiased estimates of the Partition function over a distribution using simple gradient descent.\n\n**Extreme Q-learning (X-QL)** is an novel \u0026 simple RL algorithm for Q-learning that models the maximal soft-values (LogSumExp) without needing to sample from a policy. It directly estimates the optimal Bellman operator B* in continuous action spaces, successfully extending Q-iteration to continuous settings.\n\nIt obtains state-of-art results on Offline RL benchmarks such as D4RL, and can improve existing Online RL methods like SAC and TD3. It combines Max Entropy, Conservative \u0026 Implicit RL in a single framework.\n# Introduction\n\n\u003ca href=\"https://div99.github.io/XQL\"\u003e\n\u003cp align=\"center\"\u003e\n\t\u003cimg class=\"img\" src=\"docs/diagram.svg\" style=\"width:400px;height:auto;\"\u003e\u003cbr\u003e\n\u003c/p\u003e\n\u003c/a\u003e\n\nModern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from Economics. By doing so, we avoid computing Q-values using out-of-distribution actions which is often a substantial source of error. Our key insight is to introduce an objective that directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy. \u003cbr\u003e\u003cbr\u003e\nUsing EVT, we derive our **Extreme Q-Learning (XQL)** framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms,  ***that do not explicitly require access to a policy or its entropy.*** Our method obtains consistently strong performance in the D4RL benchmark, outperforming prior works by **10+ points** on some tasks while offering moderate improvements over SAC and TD3 on online DM Control tasks.\n\n\n### Citation\n```\n@article{\n\tgarg2022extreme,\n\ttitle={Extreme Q-Learning: MaxEnt Reinforcement Learning Without Entropy},\n\turl = {https://arxiv.org/abs/2301.02328},\n  \tauthor = {Garg, Divyansh and Hejna, Joey and Geist, Matthieu and Ermon, Stefano},\n\tpublisher = {arXiv},\n  \tyear = {2023},\n\t}\n```\n\n## Key Advantages\n\n✅  Directly models V* in continuous action spaces \\(Continuous Q-iteration\\)   \\\n✅  Implict, no OOD Sampling or actor-critic formulation \\\n✅  Conservative with respect to the behavior policy \\\n✅  Improves performance on the D4RL benchmark versus similar approaches\n\n## Usage\n\nFor exploring Gumbel Regression, you can play with the [Gumbel Regression notebook](https://github.com/Div99/XQL/blob/main/Gumbel_Regression.ipynb) in Google Colab. \n\nThis repository is divided into two subparts, one for the offline RL and one for the online RL experiments.\nTo install and use X-QL check the instructions provided in the [Offline folder](offline) for running Offline RL and [Online folder](online) for running Online RL.\n\n\n## Questions\nPlease feel free to email us if you have any questions. \n\nDiv Garg ([divgarg@stanford.edu](mailto:divgarg@stanford.edu?subject=[GitHub]%X-QL)), Joey Hejna([jhejna@stanford.edu](mailto:jhejna@stanford.edu?subject=[GitHub]%X-QL))\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiv99%2Fxql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiv99%2Fxql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiv99%2Fxql/lists"}