https://github.com/google-research/pisac
Tensorflow 2 source code for the PI-SAC agent from "Predictive Information Accelerates Learning in RL" (NeurIPS 2020)
https://github.com/google-research/pisac
deep-learning deep-reinforcement-learning information-theory machine-learning reinforcement-learning robotics vision
Last synced: about 1 year ago
JSON representation
Tensorflow 2 source code for the PI-SAC agent from "Predictive Information Accelerates Learning in RL" (NeurIPS 2020)
- Host: GitHub
- URL: https://github.com/google-research/pisac
- Owner: google-research
- License: apache-2.0
- Created: 2020-10-13T20:41:50.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-06-08T16:50:49.000Z (about 3 years ago)
- Last Synced: 2025-04-03T01:01:52.666Z (about 1 year ago)
- Topics: deep-learning, deep-reinforcement-learning, information-theory, machine-learning, reinforcement-learning, robotics, vision
- Language: Python
- Homepage:
- Size: 43.9 KB
- Stars: 44
- Watchers: 6
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# PI-SAC: Predictive Information Accelerates Learning in RL
[Kuang-Huei Lee][leekh], [Ian Fischer][iansf], [Anthony Liu][aliu],
[Yijie Guo][yguo], [Honglak Lee][honglak], [John Canny][canny],
[Sergio Guadarrama][sguada]
NeurIPS 2020





This repository hosts the open source implementation of PI-SAC, the
reinforcement learning agent introduced in
[Predictive Information Accelerates Learning in RL][paper]. PI-SAC combines the
Soft Actor-Critic Agent with an additional objective that learns compressive
representations of predictive information. PI-SAC agents can substantially
improve sample efficiency and returns over challenging baselines on tasks from
the [DeepMind Control Suite][dmc_paper] of vision-based continuous control
environments, where observations are pixels.
[paper]: https://arxiv.org/abs/2007.12401
[pdf_paper]: https://arxiv.org/pdf/2007.12401.pdf
[leekh]: https://scholar.google.com/citations?user=rE7-N30AAAAJ
[iansf]: https://scholar.google.com/citations?user=Z63Zf_0AAAAJ
[aliu]: https://scholar.google.com/citations?user=TjEqCOAAAAAJ
[yguo]: https://scholar.google.com/citations?user=ONuIPv0AAAAJ
[honglak]: https://scholar.google.com/citations?user=fmSHtE8AAAAJ
[canny]: https://scholar.google.com/citations?user=LAv0HTEAAAAJ
[sguada]: https://scholar.google.com/citations?user=gYiCq88AAAAJ
[dmc_paper]: https://arxiv.org/abs/1801.00690
If you find this useful for your research, please use the following to
reference:
```
@article{lee2020predictive,
title={Predictive Information Accelerates Learning in RL},
author={Lee, Kuang-Huei and Fischer, Ian and Liu, Anthony and Guo, Yijie and Lee, Honglak and Canny, John and Guadarrama, Sergio},
journal={arXiv preprint arXiv:2007.12401},
year={2020}
}
```
## Methods

PI-SAC learns compact representations of the predictive information
I(X_past;Y_future) that captures the environment transition dynamics, in
addition to actor and critic learning. We capture the predictive information in
a representation Z by maximizing I(Y_future;Z) and minimizing
I(X_past;Z|Y_future) to compress out the non-predicitve part for better
generalization, which reflects in better sampled efficiency, returns, and
transferability. When interacting with the environment, it simply executes the
actor model.
Find out more:
- [PDF paper][pdf_paper]
## Training and Evaluation
To train the model(s) in the paper with periodic evaluation, run this command:
```train
python -m pisac.run --root_dir=/tmp/pisac_cartpole_swingup \
--gin_file=pisac/config/pisac.gin \
--gin_bindings=train_pisac.train_eval.domain_name=\'cartpole\' \
--gin_bindings=train_pisac.train_eval.task_name=\'swingup\' \
--gin_bindings=train_pisac.train_eval.action_repeat=4 \
--gin_bindings=train_pisac.train_eval.initial_collect_steps=1000 \
--gin_bindings=train_pisac.train_eval.initial_feature_step=5000
```
We use `gin` to config hyperparameters. The default configs are specificed in
`pisac/config/pisac.gin`. To reproduce the main DM-Control experiments, you need
to specify different `domain_name`, `task_name`, `action_repeat`,
`initial_collect_steps`, `initial_feature_step` for each environment.
`domain_name` | `task_name` | `action_repeat` | `initial_collect_steps` | `initial_feature_step`
:------------ | :------------- | :-------------- | :---------------------- | :---------------------
cartpole | swingup | 4 | 1000 | 5000
cartpole | balance_sparse | 2 | 1000 | 5000
reacher | easy | 4 | 1000 | 5000
ball_in_cup | catch | 4 | 1000 | 5000
finger | spin | 1 | 10000 | 0
cheetah | run | 4 | 10000 | 10000
walker | walk | 2 | 10000 | 10000
walker | stand | 2 | 10000 | 10000
hopper | stand | 2 | 10000 | 10000
To use multiple gradient steps per environment step, change
`train_pisac.train_eval.collect_every` to a number larger than 1.
## Results
### DeepMind Control Suite

\*gs: number of gradient steps per environment step
## Requirements
The PI-SAC code uses Python 3 and these packages:
- tensorflow-gpu==2.3.0
- tf_agents==0.6.0
- tensorflow_probability
- dm_control (`egl` [rendering option][rendering] recommended)
- gym
- imageio
- matplotlib
- scikit-image
- scipy
- gin
- pstar
- qj
If you ever see that dm_control complains about some threading issues, please
try adding `--gin_bindings=train_pisac.train_eval.drivers_in_graph=False` to put
dm_control environment outside of the TensorFlow graph.
[rendering]: https://github.com/deepmind/dm_control#rendering
Disclaimer: This is not an official Google product.