An open API service indexing awesome lists of open source software.

https://github.com/maguids/bipedalwalker-reinforcementlearning

Use Gymnasium to develop a reinforcement learning agent. First Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.
https://github.com/maguids/bipedalwalker-reinforcementlearning

bipedalwalker-v3 bipedalwalkerhardcore ppo reinforcement-learning reinforcement-learning-agent sac trpo

Last synced: 3 months ago
JSON representation

Use Gymnasium to develop a reinforcement learning agent. First Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

Awesome Lists containing this project

README

          

# BipedalWalker Reinforcement Learning

This project was developed for the "Introduction to Intelligent Autonomous Systems" course and aims to **develop a reinforcement learning agent using Gymnasium** environment as a base. The task is to **introduce specific changes or customizations to the environment** and **train the BipedalWalker-v3 using the Stable Baselines library**. The goal is to assess how these **changes impact** the agent's learning process and performance. First Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.


## Programming Language:




python


## Requirements:

- stable-baselines3[extra]
- swig
- gymnasium[box2d]
- sb3-contrib


## The Standard BipedalWalker:



**States:**
The state is a continuous vector of 24 dimensions wich include:
- Angle and angular velocity of the hull (main body): 2 points.
- Horizontal and vertical hull speed: 2 points.
- Joint angles and leg angular velocities: 8 points (4 joints x 2 points each).
- Ground contact sensors on the legs: 2 values ​​(indicating whether each leg is in contact with the ground).
- Information on the terrain ahead (LIDAR sensors): 10 points (terrain readings to anticipate obstacles).

**Rewards:**
Reward is given for moving forward, totalling 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points.

**Percepts:**
The agent sees everything it needs to (state = perceptions), as the environment is fully observable, eliminating the need to infer hidden information or deal with noise.

**Actions:**
Actions are motor speed values in the [-1, 1] range for each of the 4 joints at both hips and knees.


## The Project:
At an early stage, we started by trying to decide which RL algorithms would be best suited for BipedalWalker-v3, so we tested several different algorithms in normal mode with 5M timesteps:



When analyzing the results, we chose to use **PPO**, **SAC** and **TRPO**.


### First Phase:
Initially, we decided to **test small rewards** to understand the **impact each of them** had and whether their use was justified or not, having trained the models with **30M timesteps**. For this we made two tests:

| Tests | Rewards | Penalties | Video |
| :- | :- | :- | :- |
|Test 1 | - Overcome steep terrain | - Vertical sudden moves (ex: falling)
- Severe instability (ex: torso inclination) | |
| Test 2 | - Alternate feet | - Vertical sudden moves (ex: falling) | |


### Second Phase:
After detecting the initial errors, we realized that we should encourage the agent to walk forward and remove the alternating use of both feet:

**Rewards:**
- Overcome steep terrain
- Moving forward

**Penalties:**
- Vertical sudden moves (ex: falling)
- Severe instability (ex: torso inclination)
- Stands still for a long time or stops moving
- Agent fails

**Videos:**

| PPO | TRPO |
| :-: | :-: |
| | |

| SAC | CONTROL |
| :-: | :-: |
| | |

### Third Phase:
In an attempt to further improve the second phase, we decided to do a third test by changing the rewards again and truing to correct the minor errors detected in the previous phase.

**Rewards:**
- Overcome steep terrain
- Moving forward
- Lifting its legs from the ground
- Using each leg the same number of times

**Penalties:**
- Vertical sudden moves (ex: falling)
- Severe instability (ex: torso inclination)
- Stands still for a long time or stops moving
- Agent fails

**Videos:**

| PPO | TRPO |
| :-: | :-: |
| | |

| SAC | CONTROL |
| :-: | :-: |
| | |

## Other Tests:
In addition to the tests demonstrated above, we also performed two other tests:

### Feet VS No Feet:
In order to try to understand if the agent would move better with feet, we **created an agent with feet**.



### Hiperparameter Tunning


## About the repository:

- rewards ➡️ Its a folder with python files with the different rewards that were used;
- Assignment.pdf ➡️ Project statement;
- BipedalWalker.pptx ➡️ Its a Powerpoint with information about the work developed;
- ExtraBW.pptx ➡️ Its a Powerpoint with some extra information about the work developed (graphs, videos...);
- bipedal_walker_custom.txt ➡️ If you wanna try the BipedalWalker with feet this is what you have to use;
- rewards_train.py ➡️ The code for trainning the agent with rewards;
- test_model.py ➡️ The code used to test the agents;
- train_models.py ➡️ The code used to train the agents.

Note:
- When trainning agents we are training several algorihtms at the same time, you can choose which ones you are using and how many environments at the same time for each algorithm, and if you are using CPU or GPU for each algorithm.


## Link to the course:

This course is part of the **first semester** of the **third year** of the **Bachelor's Degree in Artificial Intelligence and Data Science** at **FCUP** and **FEUP** in the academic year 2024/2025. You can find more information about this course at the following link:



Link to Course



FCUP


FEUP