https://github.com/manchery/iql-pytorch
Unofficial PyTorch implementation (replicating paper results) of Implicit Q-Learning (In-sample Q-Learning) for offline RL
https://github.com/manchery/iql-pytorch
implicit-q-learning offline-reinforcement-learning pytorch reinforcement-learning
Last synced: 11 months ago
JSON representation
Unofficial PyTorch implementation (replicating paper results) of Implicit Q-Learning (In-sample Q-Learning) for offline RL
- Host: GitHub
- URL: https://github.com/manchery/iql-pytorch
- Owner: Manchery
- Created: 2022-03-17T02:25:29.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-11-04T15:09:34.000Z (over 1 year ago)
- Last Synced: 2025-04-13T21:06:29.918Z (about 1 year ago)
- Topics: implicit-q-learning, offline-reinforcement-learning, pytorch, reinforcement-learning
- Language: Python
- Homepage:
- Size: 261 KB
- Stars: 23
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# IQL Implementation in PyTorch
## IQL
This repo is an unofficial implementation of **Implicit Q-Learning (In-sample Q-Learning)** in PyTorch.
```
@inproceedings{
kostrikov2022offline,
title={Offline Reinforcement Learning with Implicit Q-Learning},
author={Ilya Kostrikov and Ashvin Nair and Sergey Levine},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=68n2s9ZJWF8}
}
```
**Note**: Reward standardization (_We standardize MuJoCo locomotion task rewards by dividing by the difference of returns of the best and worst trajectories in each dataset_) used in [official implementation](https://github.com/ikostrikov/implicit_q_learning/blob/09d700248117881a75cb21f0adb95c6c8a694cb2/train_offline.py#L51C18-L51C18) is missed in this implementation. One can easily add it by itself.
## Train
### Gym-MuJoCo
```
python main_iql.py --env halfcheetah-medium-v2 --expectile 0.7 --temperature 3.0 --eval_freq 5000 --eval_episodes 10 --normalize
```
### AntMaze
```
python main_iql.py --env antmaze-medium-play-v2 --expectile 0.9 --temperature 10.0 --eval_freq 50000 --eval_episodes 100
```
## Results


## Acknowledgement
This repo borrows heavily from [sfujim/TD3_BC](https://github.com/sfujim/TD3_BC) and [ikostrikov/implicit_q_learning](https://github.com/ikostrikov/implicit_q_learning).