https://github.com/nikhilweee/offline-rl-iigs
https://github.com/nikhilweee/offline-rl-iigs
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nikhilweee/offline-rl-iigs
- Owner: nikhilweee
- Created: 2022-06-08T23:48:31.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-16T12:44:44.000Z (over 2 years ago)
- Last Synced: 2025-01-22T09:09:56.362Z (4 months ago)
- Language: Python
- Size: 70.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Offline RL for POMDPs
## Requirements
```console
$ pip install open_spiel
```## Description
* `leduc_fp.py` runs fictitious play on leduc poker for 1000 iterations and collects trajectories using three different strategies:
* `expert` collects 1M trajectories at the end of training, so this only has expert observations.
* `mixed-const` collects 1000 trajectories after every iteration, so this has mixed observations sampled at a constant rate.
* `mixed-exp` collects varying number of samples after each iteration according to an exponential decay.
* `leduc_bc.py` is a simple MLP using observations collected from one of the strategies above.
* `leduc_crr.py` runs tabular CRR using observations collected from one of the strategies above.
* `observation.py` holds the observation data structure for easy conversion to / from CSVs.
* `submit_bc.sh` / `submit_fp.sh` are sbatch scripts to run the scripts on NYU Greene.## Workthrough
* First, collect observations using one of the three strategies
```console
$ python leduc_fp.py --mode expert --num_iterations 100 --num_episodes 1_000 --traj trajectories-1k.csv
```
* Next, you can run behaviour cloning using the following command
```console
$ python leduc_bc.py --traj trajectories-1k.csv
```
* You can also run tabular CRR using the following command. Note that this is still very much work in progress.
```console
$ python leduc_crr.py --traj trajectories-1k.csv
```