{"id":13936457,"url":"https://github.com/google-research/batch-ppo","last_synced_at":"2025-04-08T17:16:59.342Z","repository":{"id":57414133,"uuid":"102891202","full_name":"google-research/batch-ppo","owner":"google-research","description":"Efficient Batched Reinforcement Learning in TensorFlow","archived":false,"fork":false,"pushed_at":"2019-01-11T04:47:24.000Z","size":151,"stargazers_count":966,"open_issues_count":5,"forks_count":147,"subscribers_count":67,"default_branch":"master","last_synced_at":"2025-04-01T16:16:59.784Z","etag":null,"topics":["artificial-intelligence","control","multi-processing","python","reinforcement-learning","tensorflow","vectorized-computation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-08T18:14:29.000Z","updated_at":"2025-03-30T05:43:44.000Z","dependencies_parsed_at":"2022-08-26T20:12:59.808Z","dependency_job_id":null,"html_url":"https://github.com/google-research/batch-ppo","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch-ppo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch-ppo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch-ppo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fbatch-ppo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/batch-ppo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247888559,"owners_count":21013001,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","control","multi-processing","python","reinforcement-learning","tensorflow","vectorized-computation"],"created_at":"2024-08-07T23:02:41.296Z","updated_at":"2025-04-08T17:16:59.314Z","avatar_url":"https://github.com/google-research.png","language":"Python","readme":"\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_transp.png\" width=25% align=\"right\"\u003e\n\nBatch PPO\n=========\n\nThis project provides optimized infrastructure for reinforcement learning. It\nextends the [OpenAI gym interface][post-gym] to multiple parallel environments\nand allows agents to be implemented in TensorFlow and perform batched\ncomputation. As a starting point, we provide BatchPPO, an optimized\nimplementation of [Proximal Policy Optimization][post-ppo].\n\nPlease cite the [TensorFlow Agents paper][paper-agents] if you use code from\nthis project in your research:\n\n```bibtex\n@article{hafner2017agents,\n  title={TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow},\n  author={Hafner, Danijar and Davidson, James and Vanhoucke, Vincent},\n  journal={arXiv preprint arXiv:1709.02878},\n  year={2017}\n}\n```\n\nDependencies: Python 2/3, TensorFlow 1.3+, Gym, ruamel.yaml\n\n[paper-agents]: https://arxiv.org/pdf/1709.02878.pdf\n[post-gym]: https://blog.openai.com/openai-gym-beta/\n[post-ppo]: https://blog.openai.com/openai-baselines-ppo/\n\nInstructions\n------------\n\nClone the repository and run the PPO algorithm by typing:\n\n```shell\npython3 -m agents.scripts.train --logdir=/path/to/logdir --config=pendulum\n```\n\nThe algorithm to use is defined in the configuration and `pendulum` started\nhere uses the included PPO implementation. Check out more pre-defined\nconfigurations in `agents/scripts/configs.py`.\n\nIf you want to resume a previously started run, add the `--timestamp=\u003ctime\u003e`\nflag to the last command and provide the timestamp in the directory name of\nyour run.\n\nTo visualize metrics start TensorBoard from another terminal, then point your\nbrowser to `http://localhost:2222`:\n\n```shell\ntensorboard --logdir=/path/to/logdir --port=2222\n```\n\nTo render videos and gather OpenAI Gym statistics to upload to the scoreboard,\ntype:\n\n```shell\npython3 -m agents.scripts.visualize --logdir=/path/to/logdir/\u003ctime\u003e-\u003cconfig\u003e --outdir=/path/to/outdir/\n```\n\nModifications\n-------------\n\nWe release this project as a starting point that makes it easy to implement new\nreinforcement learning ideas. These files are good places to start when\nmodifying the code:\n\n| File | Content |\n| ---- | ------- |\n| `scripts/configs.py` | Experiment configurations specifying the tasks and algorithms. |\n| `scripts/networks.py` | Neural network models. |\n| `scripts/train.py` | The executable file containing the training setup. |\n| `algorithms/ppo/ppo.py` | The TensorFlow graph for the PPO algorithm. |\n\nTo run unit tests and linting, type:\n\n```shell\npython2 -m unittest discover -p \"*_test.py\"\npython3 -m unittest discover -p \"*_test.py\"\npython3 -m pylint agents\n```\n\nFor further questions, please open an issue on Github.\n\nImplementation\n--------------\n\nWe include a batched interface for OpenAI Gym environments that fully integrates\nwith TensorFlow for efficient algorithm implementations. This is achieved\nthrough these core components:\n\n- **`agents.tools.wrappers.ExternalProcess`** is an environment wrapper that\n  constructs an OpenAI Gym environment inside of an external process. Calls to\n  `step()` and `reset()`, as well as attribute access, are forwarded to the\n  process and wait for the result. This allows to run multiple environments in\n  parallel without being restricted by Python's global interpreter lock.\n- **`agents.tools.BatchEnv`** extends the OpenAI Gym interface to batches of\n  environments. It combines multiple OpenAI Gym environments, with `step()`\n  accepting a batch of actions and returning a batch of observations, rewards,\n  done flags, and info objects. If the individual environments live in external\n  processes, they will be stepped in parallel.\n- **`agents.tools.InGraphBatchEnv`** integrates a batch environment into the\n  TensorFlow graph and makes its `step()` and `reset()` functions accessible as\n  operations. The current batch of observations, last actions, rewards, and done\n  flags is stored in variables and made available as tensors.\n- **`agents.tools.simulate()`** fuses the step of an in-graph batch environment\n  and a reinforcement learning algorithm together into a single operation to be\n  called inside the training loop. This reduces the number of session calls and\n  provides a simple way to train future algorithms.\n\nTo understand all the code, please make yourself familiar with TensorFlow's\ncontrol flow operations, especially [`tf.cond()`][tf-cond],\n[`tf.scan()`][tf-scan], and\n[`tf.control_dependencies()`][tf-control-dependencies].\n\n[tf-cond]: https://www.tensorflow.org/api_docs/python/tf/cond\n[tf-scan]: https://www.tensorflow.org/api_docs/python/tf/scan\n[tf-control-dependencies]: https://www.tensorflow.org/api_docs/python/tf/control_dependencies\n\nDisclaimer\n----------\n\nThis is not an official Google product.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fbatch-ppo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fbatch-ppo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fbatch-ppo/lists"}