Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/upkie/ppo_balancer
Train a balancing policy for Upkie by reinforcement learning
https://github.com/upkie/ppo_balancer
legged-robots locomotion python reinforcement-learning robotics wheeled-biped
Last synced: 3 months ago
JSON representation
Train a balancing policy for Upkie by reinforcement learning
- Host: GitHub
- URL: https://github.com/upkie/ppo_balancer
- Owner: upkie
- License: apache-2.0
- Created: 2024-02-26T11:04:54.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-10-16T12:25:04.000Z (3 months ago)
- Last Synced: 2024-10-18T09:54:20.505Z (3 months ago)
- Topics: legged-robots, locomotion, python, reinforcement-learning, robotics, wheeled-biped
- Language: Python
- Homepage:
- Size: 197 KB
- Stars: 4
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PPO balancer
The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the [MPC balancer](https://github.com/upkie/mpc_balancer) and [PID balancer](https://upkie.github.io/upkie/pid-balancer.html), it balances Upkie with straight legs. Training uses the
UpkieGroundVelocity
gym environment and the PPO implementation from [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html).An overview video of the training pipeline is given in this video: [Sim-to-real RL pipeline for Upkie wheeled bipeds](https://www.youtube.com/shorts/bvWgYso1dzI).
## Installation
```console
conda env create -f environment.yaml
conda activate ppo_balancer
```## Running a policy
### On your machine
To run the default policy:
```console
make test_policy
```Here we assumed the spine is already up and running, for instance by running ``./start_simulation.sh`` from [upkie](https://github.com/upkie/upkie) on your machine, or by starting a pi3hat spine on the robot.
To run a policy saved to a custom path, use for instance:
```console
python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip
```## On a real robot
To build and upload your policy to the robot:
```console
$ make build
$ make upload
```Then, SSH into the robot and run the following target:
```console
$ ssh your-upkie
user@your-upkie:~$ make run_ppo_balancer
```This will run the policy saved at the default path. To run a custom policy, save its ZIP file to ``ppo_balancer/policy/params.zip`` (save its operative config as well) and follow the same steps.
## Training a new policy
First, check that training progresses one rollout at a time:
```console
make train_and_show
```Once this works you can train for real, with more environments and no GUI:
```console
make train
```Check out the `time/fps` plots in the command line or in TensorBoard to adjust the number of parallel environments:
```console
make tensorboard
```You should increase the number of environments from the default value (``NB_TRAINING_ENVS`` in the Makefile) to "as much as you can as long as FPS keeps going up".
## Troubleshooting
### Shared object file not found
**Symptom:** you are getting errors related to PyTorch not finding shared object files, with a call to ``_preload_cuda_deps()`` somewhere in the traceback:
```
File ".../torch/__init__.py", line 178, in _load_global_deps
_preload_cuda_deps()
File ".../torch/__init__.py", line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: .../nvidia/cublas/lib/libcublas.so.11: cannot open shared object file: No such file or directory
```**Workaround:** ``pip install torch`` in your local pip environment. This will override Bazel's and allow you to train and run normally.