https://github.com/professornova/ppo-humanoid
PPO implementation for controlling a humanoid in Gymnasium's Mujoco environment, featuring customizable training scripts and multi-environment parallel training.
https://github.com/professornova/ppo-humanoid
artificial-intelligence gymnasium humanoid-walking mujoco-environments proximal-policy-optimization
Last synced: about 2 months ago
JSON representation
PPO implementation for controlling a humanoid in Gymnasium's Mujoco environment, featuring customizable training scripts and multi-environment parallel training.
- Host: GitHub
- URL: https://github.com/professornova/ppo-humanoid
- Owner: ProfessorNova
- Created: 2024-08-22T17:14:10.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-03-06T06:53:36.000Z (3 months ago)
- Last Synced: 2025-03-06T07:45:51.178Z (3 months ago)
- Topics: artificial-intelligence, gymnasium, humanoid-walking, mujoco-environments, proximal-policy-optimization
- Language: Python
- Homepage:
- Size: 4.62 MB
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PPO-Humanoid
This repository contains the implementation of a Proximal Policy Optimization (PPO) agent to control a humanoid in the
OpenAI Gymnasium Mujoco environment. The agent is trained to master complex humanoid locomotion using deep reinforcement
learning.---
## Results

Here is a demonstration of the agent's performance after training for 3000 epochs on the Humanoid-v4 environment.
---
## Installation
To get started with this project, follow these steps:
1. **Clone the Repository**:
```bash
git clone https://github.com/ProfessorNova/PPO-Humanoid.git
cd PPO-Humanoid
```2. **Set Up Python Environment**:
Make sure you have Python installed (tested with Python 3.10.11).3. **Install Dependencies**:
Run the following command to install the required packages:
```bash
pip install -r req.txt
```For proper PyTorch installation, visit [pytorch.org](https://pytorch.org/get-started/locally/) and follow the
instructions based on your system configuration.4. **Install Gymnasium Mujoco**:
You need to install the Mujoco environment to simulate the humanoid:
```bash
pip install gymnasium[mujoco]
```5. **Train the Model (PPO)**:
To start training the model, run:
```bash
python train_ppo.py
```6. **Monitor Training Progress**:
You can monitor the training progress by viewing the videos in the `videos` folder or by looking at the graphs in
TensorBoard:
```bash
tensorboard --logdir "logs"
```---
## Description
### Overview
This project implements a reinforcement learning agent using the Proximal Policy Optimization (PPO) algorithm, a popular
method for continuous control tasks. The agent is designed to learn how to control a humanoid robot in a simulated
environment.### Key Components
- **Agent**: The core neural network model that outputs both policy (action probabilities) and value estimates.
- **Environment**: The Humanoid-v5 environment from the Gymnasium Mujoco suite, which provides a realistic physics
simulation for testing control algorithms.
- **Buffer**: A class for storing trajectories (observations, actions, rewards, etc.) that the agent collects during
interaction with the environment. This data is later used to calculate advantages and train the model.
- **Training Script**: The `train_ppo.py` script handles the training loop, including collecting data, updating the
model, and logging results.---
## Usage
### Training
You can customize the training by modifying the command-line arguments:
- `--n-envs`: Number of environments to run in parallel (default: 32).
- `--n-epochs`: Number of epochs to train the model (default: 3000).
- `--n-steps`: Number of steps per environment per epoch (default: 2048).
- `--batch-size`: Batch size for training (default: 16384).
- `--train-iters`: Number of training iterations per epoch (default: 20).For example:
```bash
python train_ppo.py --n-envs 64 --batch-size 4096 --train-iters 30 --cuda
```All hyperparameters can be viewed either with `python train_ppo.py --help` or by looking at the
`parse_args_ppo()` function in `lib/utils.py`.---
### Statistics
### Performance Metrics:
The following charts provide insights into the performance during training with the current default hyperparameters
(Note: After updating to Humanoid-v5 environment I only trained for 1000 epochs. The results are still promising and
should achieve the previous results with more training):- **Reward**:
- **Policy Loss**:
- **Value Loss**:
- **Entropy**:
