https://github.com/ikostrikov/pytorch-trpo
PyTorch implementation of Trust Region Policy Optimization
https://github.com/ikostrikov/pytorch-trpo
continuous-control deep-learning deep-reinforcement-learning mujoco pytorch reinforcement-learning trpo trust-region-policy-optimization
Last synced: 8 months ago
JSON representation
PyTorch implementation of Trust Region Policy Optimization
- Host: GitHub
- URL: https://github.com/ikostrikov/pytorch-trpo
- Owner: ikostrikov
- License: mit
- Created: 2017-05-26T12:33:18.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-09-13T13:59:57.000Z (about 7 years ago)
- Last Synced: 2025-03-30T09:07:56.589Z (8 months ago)
- Topics: continuous-control, deep-learning, deep-reinforcement-learning, mujoco, pytorch, reinforcement-learning, trpo, trust-region-policy-optimization
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 439
- Watchers: 11
- Forks: 90
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- Awesome-pytorch-list-CNVersion - pytorch-trpo(Hessian-vector product version) - vector product instead of finite differences approximation. (Paper implementations|论文实现 / Other libraries|其他库:)
- Awesome-pytorch-list - pytorch-trpo(Hessian-vector product version) - vector product instead of finite differences approximation. (Paper implementations / Other libraries:)
README
# PyTorch implementation of TRPO
Try my implementation of [PPO](github.com/ikostrikov/pytorch-a2c-ppo-acktr/) (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.
##
This is a PyTorch implementation of ["Trust Region Policy Optimization (TRPO)"](https://arxiv.org/abs/1502.05477).
This is code mostly ported from [original implementation by John Schulman](https://github.com/joschu/modular_rl). In contrast to [another implementation of TRPO in PyTorch](https://github.com/mjacar/pytorch-trpo), this implementation uses exact Hessian-vector product instead of finite differences approximation.
## Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.
## Usage
```
python main.py --env-name "Reacher-v1"
```
## Recommended hyper parameters
InvertedPendulum-v1: 5000
Reacher-v1, InvertedDoublePendulum-v1: 15000
HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000
Ant-v1, Humanoid-v1: 50000
## Results
More or less similar to the original code. Coming soon.
## Todo
- [ ] Plots.
- [ ] Collect data in multiple threads.