{"id":13435502,"url":"https://github.com/keiohta/tf2rl","last_synced_at":"2026-01-14T08:26:05.357Z","repository":{"id":40643734,"uuid":"180587550","full_name":"keiohta/tf2rl","owner":"keiohta","description":"TensorFlow2 Reinforcement Learning","archived":false,"fork":false,"pushed_at":"2022-02-13T02:49:47.000Z","size":9025,"stargazers_count":476,"open_issues_count":39,"forks_count":103,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-10-27T08:10:27.714Z","etag":null,"topics":["deep-reinforcement-learning","imitation-learning","inverse-reinforcement-learning","reinforcement-learning","tensorflow","tensorflow2"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keiohta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-10T13:27:05.000Z","updated_at":"2025-10-24T16:21:09.000Z","dependencies_parsed_at":"2022-08-03T01:30:10.323Z","dependency_job_id":null,"html_url":"https://github.com/keiohta/tf2rl","commit_stats":null,"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/keiohta/tf2rl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keiohta%2Ftf2rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keiohta%2Ftf2rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keiohta%2Ftf2rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keiohta%2Ftf2rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keiohta","download_url":"https://codeload.github.com/keiohta/tf2rl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keiohta%2Ftf2rl/sbom","scorecard":{"id":553907,"data":{"date":"2025-08-11","repo":{"name":"github.com/keiohta/tf2rl","commit":"43523930b3328b28fcf2ce64e6a9a8cf4a403044"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.4,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":10,"reason":"all changesets reviewed","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/doc.yml:1","Warn: no topLevel permission defined: .github/workflows/docker.yml:1","Warn: no topLevel permission defined: .github/workflows/linter.yml:1","Warn: no topLevel permission defined: .github/workflows/publish_to_pypi.yml:1","Warn: no topLevel permission defined: .github/workflows/test.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/doc.yml:9: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/doc.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/doc.yml:10: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/doc.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/doc.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/doc.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/doc.yml:29: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/doc.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/doc.yml:33: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/doc.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/docker.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/docker.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docker.yml:15: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/docker.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docker.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/docker.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/linter.yml:9: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/linter.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/linter.yml:10: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/linter.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish_to_pypi.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/publish_to_pypi.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish_to_pypi.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/publish_to_pypi.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/publish_to_pypi.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/publish_to_pypi.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:35: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/test.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:36: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/test.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:43: update your workflow using https://app.stepsecurity.io/secureworkflow/keiohta/tf2rl/test.yml/master?enable=pin","Warn: containerImage not pinned by hash: Dockerfile:2: pin your Docker image by updating python:3.7 to python:3.7@sha256:eedf63967cdb57d8214db38ce21f105003ed4e4d0358f02bedc057341bcf92a0","Warn: containerImage not pinned by hash: Dockerfile.nvidia:2: pin your Docker image by updating tensorflow/tensorflow:2.2.1-gpu to tensorflow/tensorflow:2.2.1-gpu@sha256:7850117694a707c7ff5cdfc006cd585dccc2425dbac14b2e86bf8b5d5231131c","Warn: pipCommand not pinned by hash: Dockerfile:10-17","Warn: pipCommand not pinned by hash: Dockerfile:22","Warn: pipCommand not pinned by hash: Dockerfile.nvidia:14-20","Warn: pipCommand not pinned by hash: Dockerfile.nvidia:25","Warn: pipCommand not pinned by hash: .github/workflows/doc.yml:24","Warn: pipCommand not pinned by hash: .github/workflows/doc.yml:25","Warn: pipCommand not pinned by hash: .github/workflows/doc.yml:26","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:50","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:53","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:56","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:57","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:58","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:59","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:62","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:65","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:68","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:71","Warn: pipCommand not pinned by hash: .github/workflows/test.yml:74","Info:   0 out of  11 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   5 third-party GitHubAction dependencies pinned","Info:   0 out of   2 containerImage dependencies pinned","Info:   0 out of  18 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 30 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-20T11:47:34.463Z","repository_id":40643734,"created_at":"2025-08-20T11:47:34.463Z","updated_at":"2025-08-20T11:47:34.463Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413944,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:16:59.381Z","status":"ssl_error","status_checked_at":"2026-01-14T08:13:45.490Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-reinforcement-learning","imitation-learning","inverse-reinforcement-learning","reinforcement-learning","tensorflow","tensorflow2"],"created_at":"2024-07-31T03:00:36.317Z","updated_at":"2026-01-14T08:26:05.334Z","avatar_url":"https://github.com/keiohta.png","language":"Python","funding_links":[],"categories":["Sample Codes / Projects \u003ca name=\"sample\" /\u003e ⛏️📐📁","时间序列","Other 💛💛💛💛💛\u003ca name=\"Other\" /\u003e"],"sub_categories":["Reinforcement Learning \u003ca name=\"RL\" /\u003e🔮","网络服务_其他","强化学习"],"readme":"[![Test](https://github.com/keiohta/tf2rl/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/keiohta/tf2rl/actions/workflows/test.yml)\n[![Coverage Status](https://coveralls.io/repos/github/keiohta/tf2rl/badge.svg?branch=master)](https://coveralls.io/github/keiohta/tf2rl?branch=master)\n[![MIT License](http://img.shields.io/badge/license-MIT-blue.svg?style=flat)](LICENSE)\n[![GitHub issues open](https://img.shields.io/github/issues/keiohta/tf2rl.svg)]()\n[![PyPI version](https://badge.fury.io/py/tf2rl.svg)](https://badge.fury.io/py/tf2rl)\n\n# TF2RL\nTF2RL is a deep reinforcement learning library that implements various deep reinforcement learning algorithms using [TensorFlow 2.x](https://www.tensorflow.org/).\n\n## 1. Algorithms\nFollowing algorithms are supported:\n\n|                          Algorithm                           | Dicrete action | Continuous action |                  Support                   | Category                 |\n| :----------------------------------------------------------: | :------------: | :---------------: | :----------------------------------------: | ------------------------ |\n| [VPG](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf), [PPO](\u003chttps://arxiv.org/abs/1707.06347\u003e) |       ✓        |         ✓         |  [GAE](https://arxiv.org/abs/1506.02438)   | Model-free On-policy RL  |\n| [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) (including [DDQN](https://arxiv.org/abs/1509.06461), [Prior. DQN](https://arxiv.org/abs/1511.05952), [Duel. DQN](https://arxiv.org/abs/1511.06581), [Distrib. DQN](\u003chttps://arxiv.org/abs/1707.06887\u003e), [Noisy DQN](\u003chttps://arxiv.org/abs/1706.10295\u003e)) |       ✓        |         -         | [ApeX](\u003chttps://arxiv.org/abs/1803.00933\u003e) | Model-free Off-policy RL |\n| [DDPG](https://arxiv.org/abs/1509.02971) (including [TD3](\u003chttps://arxiv.org/abs/1802.09477\u003e), [BiResDDPG](\u003chttps://arxiv.org/abs/1905.01072\u003e)) |       -        |         ✓         | [ApeX](\u003chttps://arxiv.org/abs/1803.00933\u003e) | Model-free Off-policy RL |\n|          [SAC](\u003chttps://arxiv.org/abs/1801.01290\u003e)           |       ✓        |         ✓         | [ApeX](\u003chttps://arxiv.org/abs/1803.00933\u003e) | Model-free Off-policy RL |\n| [CURL](https://arxiv.org/abs/2004.04136), [SAC-AE](https://arxiv.org/abs/1910.01741) |       -        |         ✓         |                     -                      | Model-free Off-policy RL |\n| [MPC](https://arxiv.org/abs/1708.02596), [ME-TRPO](https://arxiv.org/abs/1802.10592) |       ✓        |         ✓         |                     -                      | Model-base RL            |\n| [GAIL](\u003chttps://arxiv.org/abs/1606.03476\u003e), [GAIfO](\u003chttps://arxiv.org/abs/1807.06158\u003e), [VAIL](\u003chttps://arxiv.org/abs/1810.00821\u003e) (including [Spectral Normalization](\u003chttps://arxiv.org/abs/1802.05957\u003e)) |       ✓        |         ✓         |                     -                      | Imitation Learning       |\n\nFollowing papers have been implemented in tf2rl:\n\n- Model-free On-policy RL\n  - [Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/vpg.py\u003e)\n  - [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/misc/discount_cumsum.py\u003e)\n  - [Proximal Policy Optimization Algorithms](\u003chttps://arxiv.org/abs/1707.06347\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/ppo.py\u003e)\n- Model-free Off-policy RL\n  - [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/dqn.py\u003e)\n  - [Human-level control through Deep Reinforcement Learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/dqn.py\u003e)\n  - [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/dqn.py\u003e)\n  - [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/dqn.py\u003e)\n  - [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/dqn.py\u003e)\n  - [A Distributional Perspective on Reinforcement Learning](\u003chttps://arxiv.org/abs/1707.06887\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/dqn.py\u003e)\n  - [Noisy Networks for Exploration](\u003chttps://arxiv.org/abs/1706.10295\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/networks/noisy_dense.py\u003e)\n  - [Distributed Prioritized Experience Replay](\u003chttps://arxiv.org/abs/1803.00933\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/apex.py\u003e)\n  - [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/ddpg.py\u003e)\n  - [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](\u003chttps://arxiv.org/abs/1801.01290\u003e), [Soft Actor-Critic Algorithms and Applications](https://arxiv.org/abs/1812.05905), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/sac.py\u003e)\n  - [Addressing Function Approximation Error in Actor-Critic Methods](\u003chttps://arxiv.org/abs/1802.09477\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/td3.py\u003e)\n  - [Deep Residual Reinforcement Learning](\u003chttps://arxiv.org/abs/1905.01072\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/bi_res_ddpg.py\u003e)\n  - [Soft Actor-Critic for Discrete Action Settings](https://arxiv.org/abs/1910.07207v1), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/sac_discrete.py\u003e)\n  - [Improving Sample Efficiency in Model-Free Reinforcement Learning from Images](https://arxiv.org/abs/1910.01741), [code](https://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/sac_ae.py)\n  - [CURL: Contrastive Unsupervised Representations for Reinforcement Learning](https://arxiv.org/abs/2004.04136), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/curl_sac.py\u003e)\n- Model-base RL\n  - [Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning](https://arxiv.org/abs/1708.02596), [code](https://github.com/keiohta/tf2rl/blob/master/tf2rl/experiments/mpc_trainer.py)\n  - [Model-Ensemble Trust-Region Policy Optimization](https://arxiv.org/abs/1802.10592), [code](https://github.com/keiohta/tf2rl/blob/master/tf2rl/experiments/me_trpo_trainer.py)\n- Imitation Learning\n  - [Generative Adversarial Imitation Learning](\u003chttps://arxiv.org/abs/1606.03476\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/gail.py\u003e)\n  - [Spectral Normalization for Generative Adversarial Networks](\u003chttps://arxiv.org/abs/1802.05957\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/networks/spectral_norm_dense.py\u003e)\n  - [Generative Adversarial Imitation from Observation](\u003chttps://arxiv.org/abs/1807.06158\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/gail.py\u003e)\n  - [Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow](\u003chttps://arxiv.org/abs/1810.00821\u003e), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/vail.py\u003e)\n\nAlso, some useful techniques are implemented:\n\n- [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114), [code](https://github.com/keiohta/tf2rl/blob/master/tf2rl/tools/vae.py)\n- [D2RL](https://arxiv.org/abs/2010.09163), [code](\u003chttps://github.com/keiohta/tf2rl/blob/master/tf2rl/algos/d2rl_sac.py\u003e)\n\n## 2. Installation\n\nThere are several ways to install tf2rl.\nThe recommended way is \"2.1 Install from PyPI\".\n\nIf TensorFlow is already installed, we try to identify the best\nversion of [TensorFlow Probability](https://www.tensorflow.org/probability).\n\n### 2.1 Install from PyPI\n\nYou can install `tf2rl` from [PyPI](https://pypi.org/project/tf2rl/):\n\n```bash\n$ pip install tf2rl\n```\n\n### 2.2 Install from Source Code\nYou can also install from source:\n\n```bash\n$ git clone https://github.com/keiohta/tf2rl.git tf2rl\n$ cd tf2rl\n$ pip install .\n```\n\n### 2.3 Preinstalled Docker Container\nInstead of installing tf2rl on your (virtual) system, you can use\npreinstalled Docker containers.\n\nOnly the first execution requires time to download the container image.\n\nAt the following commands, you need to replace `\u003cversion\u003e` with the\nversion tag which you want to use.\n\n\n#### 2.3.1 CPU Only\n\nThe following simple command starts preinstalled container.\n\n```bash\n$ docker run -it ghcr.io/keiohta/tf2rl/cpu:\u003cversion\u003e bash\n```\n\nIf you also want to mount your local directory `/local/dir/path` at\ncontainer `/mount/point`\n\n```bash\n$ docker run -it -v /local/dir/path:/mount/point ghcr.io/keiohta/tf2rl/cpu:\u003cversion\u003e bash\n```\n\n#### 2.3.2 GPU Support (Linux Only, Experimental)\n\nWARNING: We encountered unsolved errors when running ApeX multiprocess learning.\n\nRequirements\n- Linux\n- NVIDIA GPU\n  - TF2.2 compatible driver\n- Docker 19.03 or later\n\n\nThe following simple command starts preinstalled container.\n\n```bash\n$ docker run --gpus all -it ghcr.io/keiohta/tf2rl/nvidia:\u003cversion\u003e bash\n```\n\nIf you also want to mount your local directory `/local/dir/path` at\ncontainer `/mount/point`\n\n\n```bash\n$ docker run --gpus all -it -v /local/dir/path:/mount/point ghcr.io/keiohta/tf2rl/nvidia:\u003cversion\u003e bash\n```\n\n\nIf your container can see GPU correctly, you can check inside\ncontainer by the following comand;\n\n```bash\n$ nvidia-smi\n```\n\n\n## 3. Getting started\nHere is a quick example of how to train DDPG agent on a Pendulum environment:\n\n```python\nimport gym\nfrom tf2rl.algos.ddpg import DDPG\nfrom tf2rl.experiments.trainer import Trainer\n\n\nparser = Trainer.get_argument()\nparser = DDPG.get_argument(parser)\nargs = parser.parse_args()\n\nenv = gym.make(\"Pendulum-v1\")\ntest_env = gym.make(\"Pendulum-v1\")\npolicy = DDPG(\n    state_shape=env.observation_space.shape,\n    action_dim=env.action_space.high.size,\n    gpu=-1,  # Run on CPU. If you want to run on GPU, specify GPU number\n    memory_capacity=10000,\n    max_action=env.action_space.high[0],\n    batch_size=32,\n    n_warmup=500)\ntrainer = Trainer(policy, env, args, test_env=test_env)\ntrainer()\n```\n\nYou can check implemented algorithms in [examples](https://github.com/keiohta/tf2rl/tree/master/examples).\nFor example if you want to train DDPG agent:\n\n```bash\n# You must change directory to avoid importing local files\n$ cd examples\n# For options, please specify --help or read code for options\n$ python run_ddpg.py [options]\n```\n\nYou can see the training progress/results from TensorBoard as follows:\n\n```bash\n# When executing `run_**.py`, its logs are automatically generated under `./results`\n$ tensorboard --logdir results\n```\n\n## 4. Usage\nIn basic usage, what you need is initializing one of the policy\nclasses and `Trainer` class.\n\nAs a option, tf2rl supports command line program style, so that you\ncan also pass configuration parameters from command line arguments.\n\n\n### 4.1 Command Line Program Style\n\n`Trainer` class and policy classes have class method `get_argument`,\nwhich creates or updates\n[ArgParser](https://docs.python.org/3/library/argparse.html) object.\n\nYou can parse the command line arguments with the\n`ArgParser.parse_args` method, which returns `Namespace` object.\n\nPolicy's constructor option can be extracted from the `Namespace`\nobject explicitly. `Trainer` constructor accepts the `Namespace`\nobject.\n\n```python\nfrom tf2rl.algos.dqn import DQN\nfrom tf2rl.experiments.trainer import Trainer\n\nenv = ... # Create gym.env like environment.\n\nparser = DQN.get_argument(Trainer.get_argument())\nargs = parser.parse_args()\n\npolicy = DQN(enable_double_dqn = args.enable_double_dqn,\n             enable_dueling_dqn = args.enable_dueling_dqn,\n\t\t\t enable_noisy_dqn = args.enable_noisy_dqn)\ntrainer = Trainer(policy, env, args)\ntrainer()\n```\n\n\n### 4.2 Non Command Line Program Style (e.g. on Jupyter Notebook)\n\n`ArgParser` doesn't fit the usage on Jupyter Notebook like\nenvrionment. `Trainer` constructor can accept `dict` as `args`\nargument instead of `Namespace` object.\n\n```python\nfrom tf2rl.algos.dqn import DQN\nfrom tf2rl.experiments.trainer import Trainer\n\nenv = ... # Create gym.env like environment.\n\npolicy = DQN( ... )\ntrainer = Trainer(policy, env, {\"max_steps\": int(1e+6), ... })\ntrainer()\n```\n\n### 4.3 Results\nThe `Trainer` class saves logs and models under\n`\u003clogdir\u003e/%Y%m%dT%H%M%S.%f`. The default `logdir` is `\"results\"`, and\nit can be changed by `--logdir` command argument or `\"logdir\"` key in\nconstructor `args`.\n\n## 5. Citation\n```\n@misc{ota2020tf2rl,\n  author = {Kei Ota},\n  title = {TF2RL},\n  year = {2020},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/keiohta/tf2rl/}}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeiohta%2Ftf2rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeiohta%2Ftf2rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeiohta%2Ftf2rl/lists"}