Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ikostrikov/pytorch-trpo
PyTorch implementation of Trust Region Policy Optimization
https://github.com/ikostrikov/pytorch-trpo
continuous-control deep-learning deep-reinforcement-learning mujoco pytorch reinforcement-learning trpo trust-region-policy-optimization
Last synced: about 23 hours ago
JSON representation
PyTorch implementation of Trust Region Policy Optimization
- Host: GitHub
- URL: https://github.com/ikostrikov/pytorch-trpo
- Owner: ikostrikov
- License: mit
- Created: 2017-05-26T12:33:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-09-13T13:59:57.000Z (over 6 years ago)
- Last Synced: 2025-01-28T21:09:25.300Z (9 days ago)
- Topics: continuous-control, deep-learning, deep-reinforcement-learning, mujoco, pytorch, reinforcement-learning, trpo, trust-region-policy-optimization
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 435
- Watchers: 12
- Forks: 91
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# PyTorch implementation of TRPO
Try my implementation of [PPO](github.com/ikostrikov/pytorch-a2c-ppo-acktr/) (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.
##
This is a PyTorch implementation of ["Trust Region Policy Optimization (TRPO)"](https://arxiv.org/abs/1502.05477).
This is code mostly ported from [original implementation by John Schulman](https://github.com/joschu/modular_rl). In contrast to [another implementation of TRPO in PyTorch](https://github.com/mjacar/pytorch-trpo), this implementation uses exact Hessian-vector product instead of finite differences approximation.
## Contributions
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.
## Usage
```
python main.py --env-name "Reacher-v1"
```## Recommended hyper parameters
InvertedPendulum-v1: 5000
Reacher-v1, InvertedDoublePendulum-v1: 15000
HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000
Ant-v1, Humanoid-v1: 50000
## Results
More or less similar to the original code. Coming soon.
## Todo
- [ ] Plots.
- [ ] Collect data in multiple threads.