https://github.com/upkie/ppo_balancer
Train a balancing policy for Upkie by reinforcement learning
https://github.com/upkie/ppo_balancer
legged-robots locomotion python reinforcement-learning robotics wheeled-biped
Last synced: about 1 month ago
JSON representation
Train a balancing policy for Upkie by reinforcement learning
- Host: GitHub
- URL: https://github.com/upkie/ppo_balancer
- Owner: upkie
- License: apache-2.0
- Created: 2024-02-26T11:04:54.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-06T15:12:44.000Z (3 months ago)
- Last Synced: 2025-04-04T12:01:42.376Z (2 months ago)
- Topics: legged-robots, locomotion, python, reinforcement-learning, robotics, wheeled-biped
- Language: Python
- Homepage:
- Size: 217 KB
- Stars: 6
- Watchers: 1
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PPO balancer
[](https://github.com/upkie/upkie/tree/v6.0.0)
The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the [MPC balancer](https://github.com/upkie/mpc_balancer) and [PID balancer](https://upkie.github.io/upkie/pid-balancer.html), it balances Upkie with straight legs. Training uses the
UpkieGroundVelocity
gym environment and the PPO implementation from [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html).An overview video of the training pipeline is given in this video: [Sim-to-real RL pipeline for Upkie wheeled bipeds](https://www.youtube.com/shorts/bvWgYso1dzI).
## Installation
### On your machine
```console
conda env create -f environment.yaml
conda activate ppo_balancer
```### On your Upkie
The PPO balancer uses [pixi](https://pixi.sh/latest/#installation) and [pixi-pack](https://github.com/Quantco/pixi-pack/releases) to pack a standalone Python environment to run policies on your Upkie. First, create `environment.tar` and upload it by:
```console
make pack_pixi_env
make upload
```Then, unpack the remote environment:
```console
$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_pixi_env
```## Running a policy
### On your machine
To run the default policy:
```console
make run_agent
```Here we assumed the spine is already up and running, for instance by running `./start_simulation.sh` on your machine, or by starting a pi3hat spine on the robot.
To run a policy saved to a custom path, use for instance:
```console
python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip
```## On your Upkie
Once the agent and Python environment have been uploaded with the instructions above, you can SSH into the robot and run the same target:
```console
$ ssh user@your-upkie
user@your-upkie:~$ make run_agent
```This will run the policy saved at the default path. To run a custom policy, save its ZIP file to the robot (save its operative config as well for your future reference) and pass it path as argument to `run.py`.
## Training a new policy
First, check that training progresses one rollout at a time:
```console
make train_and_show
```Once this works you can train for real, with more environments and no GUI:
```console
make train
```Check out the `time/fps` plots in the command line or in TensorBoard to adjust the number of parallel environments:
```console
make tensorboard
```You should increase the number of environments from the default value (``NB_TRAINING_ENVS`` in the Makefile) to "as much as you can as long as FPS keeps going up".
## See also
- [Why aren't simulations deterministic when the policy is deterministic?](https://github.com/orgs/upkie/discussions/471)
- [Error: Shared object file not found](https://github.com/upkie/ppo_balancer/issues/8)
- [Packing pixi environments for the Raspberry Pi](https://github.com/orgs/upkie/discussions/467)