{"id":25133106,"url":"https://github.com/professornova/ppo-humanoid","last_synced_at":"2025-04-03T01:24:38.225Z","repository":{"id":254326887,"uuid":"846175545","full_name":"ProfessorNova/PPO-Humanoid","owner":"ProfessorNova","description":"PPO implementation for controlling a humanoid in Gymnasium's Mujoco environment, featuring customizable training scripts and multi-environment parallel training.","archived":false,"fork":false,"pushed_at":"2025-03-06T06:53:36.000Z","size":4840,"stargazers_count":6,"open_issues_count":2,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-06T07:45:51.178Z","etag":null,"topics":["artificial-intelligence","gymnasium","humanoid-walking","mujoco-environments","proximal-policy-optimization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ProfessorNova.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-22T17:14:10.000Z","updated_at":"2025-03-06T06:53:40.000Z","dependencies_parsed_at":"2025-02-09T22:30:27.194Z","dependency_job_id":null,"html_url":"https://github.com/ProfessorNova/PPO-Humanoid","commit_stats":null,"previous_names":["professornova/ppo-humanoid"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProfessorNova%2FPPO-Humanoid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProfessorNova%2FPPO-Humanoid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProfessorNova%2FPPO-Humanoid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProfessorNova%2FPPO-Humanoid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ProfessorNova","download_url":"https://codeload.github.com/ProfessorNova/PPO-Humanoid/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246918620,"owners_count":20854851,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","gymnasium","humanoid-walking","mujoco-environments","proximal-policy-optimization"],"created_at":"2025-02-08T15:34:14.710Z","updated_at":"2025-04-03T01:24:38.217Z","avatar_url":"https://github.com/ProfessorNova.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PPO-Humanoid\n\nThis repository contains the implementation of a Proximal Policy Optimization (PPO) agent to control a humanoid in the\nOpenAI Gymnasium Mujoco environment. The agent is trained to master complex humanoid locomotion using deep reinforcement\nlearning.\n\n---\n\n## Results\n\n![Demo Gif](/docs/demo.gif)\n\nHere is a demonstration of the agent's performance after training for 3000 epochs on the Humanoid-v4 environment.\n\n---\n\n## Installation\n\nTo get started with this project, follow these steps:\n\n1. **Clone the Repository**:\n    ```bash\n    git clone https://github.com/ProfessorNova/PPO-Humanoid.git\n    cd PPO-Humanoid\n    ```\n\n2. **Set Up Python Environment**:\n   Make sure you have Python installed (tested with Python 3.10.11).\n\n3. **Install Dependencies**:\n   Run the following command to install the required packages:\n    ```bash\n    pip install -r req.txt\n    ```\n\n   For proper PyTorch installation, visit [pytorch.org](https://pytorch.org/get-started/locally/) and follow the\n   instructions based on your system configuration.\n\n4. **Install Gymnasium Mujoco**:\n   You need to install the Mujoco environment to simulate the humanoid:\n    ```bash\n    pip install gymnasium[mujoco]\n    ```\n\n5. **Train the Model (PPO)**:\n   To start training the model, run:\n    ```bash\n    python train_ppo.py\n    ```\n\n6. **Monitor Training Progress**:\n   You can monitor the training progress by viewing the videos in the `videos` folder or by looking at the graphs in\n   TensorBoard:\n    ```bash\n    tensorboard --logdir \"logs\"\n    ```\n\n---\n\n## Description\n\n### Overview\n\nThis project implements a reinforcement learning agent using the Proximal Policy Optimization (PPO) algorithm, a popular\nmethod for continuous control tasks. The agent is designed to learn how to control a humanoid robot in a simulated\nenvironment.\n\n### Key Components\n\n- **Agent**: The core neural network model that outputs both policy (action probabilities) and value estimates.\n- **Environment**: The Humanoid-v5 environment from the Gymnasium Mujoco suite, which provides a realistic physics\n  simulation for testing control algorithms.\n- **Buffer**: A class for storing trajectories (observations, actions, rewards, etc.) that the agent collects during\n  interaction with the environment. This data is later used to calculate advantages and train the model.\n- **Training Script**: The `train_ppo.py` script handles the training loop, including collecting data, updating the\n  model, and logging results.\n\n---\n\n## Usage\n\n### Training\n\nYou can customize the training by modifying the command-line arguments:\n\n- `--n-envs`: Number of environments to run in parallel (default: 32).\n- `--n-epochs`: Number of epochs to train the model (default: 3000).\n- `--n-steps`: Number of steps per environment per epoch (default: 2048).\n- `--batch-size`: Batch size for training (default: 16384).\n- `--train-iters`: Number of training iterations per epoch (default: 20).\n\nFor example:\n\n```bash\npython train_ppo.py --n-envs 64 --batch-size 4096 --train-iters 30 --cuda\n```\n\nAll hyperparameters can be viewed either with `python train_ppo.py --help` or by looking at the\n`parse_args_ppo()` function in `lib/utils.py`.\n\n---\n\n### Statistics\n\n### Performance Metrics:\n\nThe following charts provide insights into the performance during training with the current default hyperparameters\n(Note: After updating to Humanoid-v5 environment I only trained for 1000 epochs. The results are still promising and\nshould achieve the previous results with more training):\n\n- **Reward**:\n  ![Reward](/docs/reward_mean.svg)\n\n- **Policy Loss**:\n  ![Policy Loss](/docs/loss_policy.svg)\n\n- **Value Loss**:\n  ![Value Loss](/docs/loss_value.svg)\n\n- **Entropy**:\n  ![Entropy](/docs/loss_entropy.svg)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprofessornova%2Fppo-humanoid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprofessornova%2Fppo-humanoid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprofessornova%2Fppo-humanoid/lists"}