https://github.com/ikostrikov/pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization
https://github.com/ikostrikov/pytorch-trpo

continuous-control deep-learning deep-reinforcement-learning mujoco pytorch reinforcement-learning trpo trust-region-policy-optimization

Last synced: 8 months ago
JSON representation

PyTorch implementation of Trust Region Policy Optimization

Host: GitHub
URL: https://github.com/ikostrikov/pytorch-trpo
Owner: ikostrikov
License: mit
Created: 2017-05-26T12:33:18.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-09-13T13:59:57.000Z (about 7 years ago)
Last Synced: 2025-03-30T09:07:56.589Z (8 months ago)
Topics: continuous-control, deep-learning, deep-reinforcement-learning, mujoco, pytorch, reinforcement-learning, trpo, trust-region-policy-optimization
Language: Python
Homepage:
Size: 9.77 KB
Stars: 439
Watchers: 11
Forks: 90
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

Awesome-pytorch-list-CNVersion - pytorch-trpo(Hessian-vector product version) - vector product instead of finite differences approximation. (Paper implementations｜论文实现 / Other libraries｜其他库:)
Awesome-pytorch-list - pytorch-trpo(Hessian-vector product version) - vector product instead of finite differences approximation. (Paper implementations / Other libraries:)

README

# PyTorch implementation of TRPO

Try my implementation of [PPO](github.com/ikostrikov/pytorch-a2c-ppo-acktr/) (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

This is a PyTorch implementation of ["Trust Region Policy Optimization (TRPO)"](https://arxiv.org/abs/1502.05477).

This is code mostly ported from [original implementation by John Schulman](https://github.com/joschu/modular_rl). In contrast to [another implementation of TRPO in PyTorch](https://github.com/mjacar/pytorch-trpo), this implementation uses exact Hessian-vector product instead of finite differences approximation.

## Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

## Usage

```
python main.py --env-name "Reacher-v1"
```

## Recommended hyper parameters

InvertedPendulum-v1: 5000

Reacher-v1, InvertedDoublePendulum-v1: 15000

HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000

Ant-v1, Humanoid-v1: 50000

## Results

More or less similar to the original code. Coming soon.

## Todo

- [ ] Plots.
- [ ] Collect data in multiple threads.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ikostrikov/pytorch-trpo

Awesome Lists containing this project

README