https://github.com/upkie/ppo_balancer

Train a balancing policy for Upkie by reinforcement learning
https://github.com/upkie/ppo_balancer

legged-robots locomotion python reinforcement-learning robotics wheeled-biped

Last synced: about 1 month ago
JSON representation

Train a balancing policy for Upkie by reinforcement learning

Host: GitHub
URL: https://github.com/upkie/ppo_balancer
Owner: upkie
License: apache-2.0
Created: 2024-02-26T11:04:54.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-06T15:12:44.000Z (3 months ago)
Last Synced: 2025-04-04T12:01:42.376Z (2 months ago)
Topics: legged-robots, locomotion, python, reinforcement-learning, robotics, wheeled-biped
Language: Python
Homepage:
Size: 217 KB
Stars: 6
Watchers: 1
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# PPO balancer

[![upkie](https://img.shields.io/badge/upkie-6.0.0-salmon)](https://github.com/upkie/upkie/tree/v6.0.0)

The PPO balancer is a feedforward neural network policy trained by reinforcement learning with a sim-to-real pipeline. Like the [MPC balancer](https://github.com/upkie/mpc_balancer) and [PID balancer](https://upkie.github.io/upkie/pid-balancer.html), it balances Upkie with straight legs. Training uses the UpkieGroundVelocity gym environment and the PPO implementation from [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html).

An overview video of the training pipeline is given in this video: [Sim-to-real RL pipeline for Upkie wheeled bipeds](https://www.youtube.com/shorts/bvWgYso1dzI).

## Installation

### On your machine

```console
conda env create -f environment.yaml
conda activate ppo_balancer
```

### On your Upkie

The PPO balancer uses [pixi](https://pixi.sh/latest/#installation) and [pixi-pack](https://github.com/Quantco/pixi-pack/releases) to pack a standalone Python environment to run policies on your Upkie. First, create `environment.tar` and upload it by:

```console
make pack_pixi_env
make upload
```

Then, unpack the remote environment:

```console
$ ssh user@your-upkie
user@your-upkie:~$ cd ppo_balancer
user@your-upkie:ppo_balancer$ make unpack_pixi_env
```

## Running a policy

### On your machine

To run the default policy:

```console
make run_agent
```

Here we assumed the spine is already up and running, for instance by running `./start_simulation.sh` on your machine, or by starting a pi3hat spine on the robot.

To run a policy saved to a custom path, use for instance:

```console
python ppo_balancer/run.py --policy ppo_balancer/training/2023-11-15/final.zip
```

## On your Upkie

Once the agent and Python environment have been uploaded with the instructions above, you can SSH into the robot and run the same target:

```console
$ ssh user@your-upkie
user@your-upkie:~$ make run_agent
```

This will run the policy saved at the default path. To run a custom policy, save its ZIP file to the robot (save its operative config as well for your future reference) and pass it path as argument to `run.py`.

## Training a new policy

First, check that training progresses one rollout at a time:

```console
make train_and_show
```

Once this works you can train for real, with more environments and no GUI:

```console
make train
```

Check out the `time/fps` plots in the command line or in TensorBoard to adjust the number of parallel environments:

```console
make tensorboard
```

You should increase the number of environments from the default value (``NB_TRAINING_ENVS`` in the Makefile) to "as much as you can as long as FPS keeps going up".

## See also

- [Why aren't simulations deterministic when the policy is deterministic?](https://github.com/orgs/upkie/discussions/471)
- [Error: Shared object file not found](https://github.com/upkie/ppo_balancer/issues/8)
- [Packing pixi environments for the Raspberry Pi](https://github.com/orgs/upkie/discussions/467)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/upkie/ppo_balancer

Awesome Lists containing this project

README