{"id":19401114,"url":"https://github.com/google-research/pisac","last_synced_at":"2025-06-23T16:41:43.669Z","repository":{"id":69176219,"uuid":"303823972","full_name":"google-research/pisac","owner":"google-research","description":"Tensorflow 2 source code for the PI-SAC agent from \"Predictive Information Accelerates Learning in RL\" (NeurIPS 2020)","archived":false,"fork":false,"pushed_at":"2023-06-08T16:50:49.000Z","size":45,"stargazers_count":44,"open_issues_count":1,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-03T01:01:52.666Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning","information-theory","machine-learning","reinforcement-learning","robotics","vision"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-10-13T20:41:50.000Z","updated_at":"2025-02-08T23:41:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"be63cf61-af84-45ce-90e5-ecde9f7dd2af","html_url":"https://github.com/google-research/pisac","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fpisac","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fpisac/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fpisac/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fpisac/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/pisac/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250582780,"owners_count":21453912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning","information-theory","machine-learning","reinforcement-learning","robotics","vision"],"created_at":"2024-11-10T11:17:14.670Z","updated_at":"2025-04-24T07:30:33.603Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PI-SAC: Predictive Information Accelerates Learning in RL\n\n[Kuang-Huei Lee][leekh], [Ian Fischer][iansf], [Anthony Liu][aliu],\n[Yijie Guo][yguo], [Honglak Lee][honglak], [John Canny][canny],\n[Sergio Guadarrama][sguada]\n\nNeurIPS 2020\n\n![cheetah_video](https://user-images.githubusercontent.com/4847452/95011238-33857a00-05e4-11eb-9224-7913a8859381.gif)\n![walker_video](https://user-images.githubusercontent.com/4847452/95011273-50ba4880-05e4-11eb-87d2-8a5c0ab54bc7.gif)\n![bic_video](https://user-images.githubusercontent.com/4847452/95011243-3c764b80-05e4-11eb-907e-3e0790bff4e1.gif)\n![cartpole_video](https://user-images.githubusercontent.com/4847452/95011256-413aff80-05e4-11eb-964a-37a333412245.gif)\n![finger_video](https://user-images.githubusercontent.com/4847452/95011270-4d26c180-05e4-11eb-9524-0db5dbc7c7ce.gif)\n\nThis repository hosts the open source implementation of PI-SAC, the\nreinforcement learning agent introduced in\n[Predictive Information Accelerates Learning in RL][paper]. PI-SAC combines the\nSoft Actor-Critic Agent with an additional objective that learns compressive\nrepresentations of predictive information. PI-SAC agents can substantially\nimprove sample efficiency and returns over challenging baselines on tasks from\nthe [DeepMind Control Suite][dmc_paper] of vision-based continuous control\nenvironments, where observations are pixels.\n\n[paper]: https://arxiv.org/abs/2007.12401\n[pdf_paper]: https://arxiv.org/pdf/2007.12401.pdf\n[leekh]: https://scholar.google.com/citations?user=rE7-N30AAAAJ\n[iansf]: https://scholar.google.com/citations?user=Z63Zf_0AAAAJ\n[aliu]: https://scholar.google.com/citations?user=TjEqCOAAAAAJ\n[yguo]: https://scholar.google.com/citations?user=ONuIPv0AAAAJ\n[honglak]: https://scholar.google.com/citations?user=fmSHtE8AAAAJ\n[canny]: https://scholar.google.com/citations?user=LAv0HTEAAAAJ\n[sguada]: https://scholar.google.com/citations?user=gYiCq88AAAAJ\n[dmc_paper]: https://arxiv.org/abs/1801.00690\n\nIf you find this useful for your research, please use the following to\nreference:\n\n```\n@article{lee2020predictive,\n  title={Predictive Information Accelerates Learning in RL},\n  author={Lee, Kuang-Huei and Fischer, Ian and Liu, Anthony and Guo, Yijie and Lee, Honglak and Canny, John and Guadarrama, Sergio},\n  journal={arXiv preprint arXiv:2007.12401},\n  year={2020}\n}\n```\n\n## Methods\n\n![pi2small](https://user-images.githubusercontent.com/4847452/95029558-e7771b80-065d-11eb-8f8b-7c2ecffc1222.png)\n\nPI-SAC learns compact representations of the predictive information\nI(X_past;Y_future) that captures the environment transition dynamics, in\naddition to actor and critic learning. We capture the predictive information in\na representation Z by maximizing I(Y_future;Z) and minimizing\nI(X_past;Z|Y_future) to compress out the non-predicitve part for better\ngeneralization, which reflects in better sampled efficiency, returns, and\ntransferability. When interacting with the environment, it simply executes the\nactor model.\n\nFind out more:\n\n-   [PDF paper][pdf_paper]\n\n## Training and Evaluation\n\nTo train the model(s) in the paper with periodic evaluation, run this command:\n\n```train\npython -m pisac.run --root_dir=/tmp/pisac_cartpole_swingup \\\n--gin_file=pisac/config/pisac.gin \\\n--gin_bindings=train_pisac.train_eval.domain_name=\\'cartpole\\' \\\n--gin_bindings=train_pisac.train_eval.task_name=\\'swingup\\' \\\n--gin_bindings=train_pisac.train_eval.action_repeat=4 \\\n--gin_bindings=train_pisac.train_eval.initial_collect_steps=1000 \\\n--gin_bindings=train_pisac.train_eval.initial_feature_step=5000\n```\n\nWe use `gin` to config hyperparameters. The default configs are specificed in\n`pisac/config/pisac.gin`. To reproduce the main DM-Control experiments, you need\nto specify different `domain_name`, `task_name`, `action_repeat`,\n`initial_collect_steps`, `initial_feature_step` for each environment.\n\n`domain_name` | `task_name`    | `action_repeat` | `initial_collect_steps` | `initial_feature_step`\n:------------ | :------------- | :-------------- | :---------------------- | :---------------------\ncartpole      | swingup        | 4               | 1000                    | 5000\ncartpole      | balance_sparse | 2               | 1000                    | 5000\nreacher       | easy           | 4               | 1000                    | 5000\nball_in_cup   | catch          | 4               | 1000                    | 5000\nfinger        | spin           | 1               | 10000                   | 0\ncheetah       | run            | 4               | 10000                   | 10000\nwalker        | walk           | 2               | 10000                   | 10000\nwalker        | stand          | 2               | 10000                   | 10000\nhopper        | stand          | 2               | 10000                   | 10000\n\nTo use multiple gradient steps per environment step, change\n`train_pisac.train_eval.collect_every` to a number larger than 1.\n\n## Results\n\n### DeepMind Control Suite\n\n![pisac_full2](https://user-images.githubusercontent.com/4847452/95033853-71ca7a00-0674-11eb-8a0f-afea8e63b4bf.png)\n\n\\*gs: number of gradient steps per environment step\n\n## Requirements\n\nThe PI-SAC code uses Python 3 and these packages:\n\n-   tensorflow-gpu==2.3.0\n-   tf_agents==0.6.0\n-   tensorflow_probability\n-   dm_control (`egl` [rendering option][rendering] recommended)\n-   gym\n-   imageio\n-   matplotlib\n-   scikit-image\n-   scipy\n-   gin\n-   pstar\n-   qj\n\nIf you ever see that dm_control complains about some threading issues, please\ntry adding `--gin_bindings=train_pisac.train_eval.drivers_in_graph=False` to put\ndm_control environment outside of the TensorFlow graph.\n\n[rendering]: https://github.com/deepmind/dm_control#rendering\n\nDisclaimer: This is not an official Google product.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fpisac","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fpisac","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fpisac/lists"}