Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kounelisagis/reinforcement-learning-playground

Last synced: 24 days ago
JSON representation

Host: GitHub
URL: https://github.com/kounelisagis/reinforcement-learning-playground
Owner: kounelisagis
Created: 2022-05-23T23:25:46.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-09-26T11:13:02.000Z (about 1 year ago)
Last Synced: 2023-09-26T13:48:43.966Z (about 1 year ago)
Language: Python
Size: 19.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Reinforcement-Learning-Playground

### Mountain Car Problem - Continuous

Used machin library to solve the Continuous Mountain Car Problem using PPO and TD3. Implemented the right actor-critic networks and found the right hyper-parameters.

![Expected Return per iteration](https://github.com/kounelisagis/Reinforcement-Learning-Playground/assets/36283973/bb51a58e-7d35-470e-800f-33c9654a0502)

*Expected Return per iteration.*

### Pendulum swing-up - Torques

Used RobotDART, OpenAI Gym spaces, created reward function and used PPO and TD3. Used Frame Skipping technique.

The initial position is defined as x0 = [π], the observation space is the vector: [cos θ, sin θ, torque], and the reward function uses the angle θ, torque, and the command given to the robot.

![Figure_1(1)](https://github.com/kounelisagis/Reinforcement-Learning-Playground/assets/36283973/bcf9147c-f9a2-423b-8f50-630066a95918)

*TD3 - Expected return per iteration.*

![Figure_3](https://github.com/kounelisagis/Reinforcement-Learning-Playground/assets/36283973/a1bbd01c-d43d-488a-9ba9-93d1b7b54727)

*PPO - Expected return per iteration.*

### Iiwa joint space RL-controller - Servo

Used RobotDART, OpenAI Gym spaces, created reward function and used PPO and TD3.

The observation space is a vector that contains all the positions and velocities of the robot's joints, and the reward function is the norm of the difference between the final and current positions.

![Figure_5](https://github.com/kounelisagis/Reinforcement-Learning-Playground/assets/36283973/6de9363c-9b4b-48a1-97e7-f402b2ebbd6c)

*TD3 - Expected return per iteration.*

![Figure_7](https://github.com/kounelisagis/Reinforcement-Learning-Playground/assets/36283973/8878edac-2b8f-450b-926d-d66fdf6302a3)

*PPO - Expected return per iteration.*