https://github.com/negarhonarvar/deepreinforcementlearning
A Complete Collection of Deep RL Famous Algorithms implemented in Gymnasium most Popular environments
https://github.com/negarhonarvar/deepreinforcementlearning
boltzmann-exploration cartpole-v1 d3qn dqn drl-algorithms gymnasium-environment lunar-lander ppo-algorithm sarsa softmax-exploration swimmer
Last synced: about 1 month ago
JSON representation
A Complete Collection of Deep RL Famous Algorithms implemented in Gymnasium most Popular environments
- Host: GitHub
- URL: https://github.com/negarhonarvar/deepreinforcementlearning
- Owner: negarhonarvar
- License: mit
- Created: 2024-05-09T18:57:17.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-13T16:12:53.000Z (about 2 months ago)
- Last Synced: 2025-04-23T18:15:10.652Z (about 1 month ago)
- Topics: boltzmann-exploration, cartpole-v1, d3qn, dqn, drl-algorithms, gymnasium-environment, lunar-lander, ppo-algorithm, sarsa, softmax-exploration, swimmer
- Language: Python
- Homepage:
- Size: 6.42 MB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep Reinforcement Learning Algorithms
Deep Reinforcement Learning Course Assignments by DR. Armin Salimi Badr.
the codes in this repository utilize Gymnasium library environments.## Prerequisites 📋
To successfully run the codes in this repository, you need to install:Gymnasium v1.0.0
# CartPole V1
### Objective
The goal of this environment is to balance a pole by applying forces in the left and right directions on the cart. It has a discrete action space:0: Push cart to the left
1: Push cart to the right
### Observation Space
Upon taking an action, either left or right, an agent observes a 4-dimensional state consisting of:Cart Position
Cart Velocity
Pole Angle
Pole Angular VelocityA reward of +1 is granted to the agent at each step while the pole is kept upright. The maximum reward an agent can earn in a single episode is 500.
### Termination
The episode ends under the following conditions:Termination: Pole Angle is greater than ±12°
Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
Truncation: Episode length exceeds 500 steps## Agents
# Lunar Lander
This environment is part of the Box2D environments which contains general information about the environment and is a classic rocket trajectory optimization problem.### Objective
According to Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off. This is the reason why this environment has discrete actions: engine on or off.### Action Space
There are four discrete actions available0: do nothing
1: fire left orientation engine
2: fire main engine
3: fire right orientation engine### Observation Space
The state is an 8-dimensional vector: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.## Agents
- DQN
- D3QN
- Enhanced D3QN# Swimmer Environment
The Swimmer environment in MuJoCo is a reinforcement learning environment where the goal is to control a multi-jointed swimmer to move forward quickly in a two-dimensional fluid environment. The swimmer is essentially a small robotic entity with a simple body structure, consisting of a head and multiple tail-like segments. As described on the reference site:"One rotor joint connects three or more segments ('links') and exactly two rotors ('rotors') to form a linear chain of articulation joints."
### Objective:
The primary goal of the actor in the Swimmer environment is to move to the right along the horizontal axis of a two-dimensional plane by activating its rotors. When a trajectory ends, the agent always starts its new trajectory from a fixed starting point. The reward function typically rewards the agent based on the horizontal distance covered by the swimmer in each time step, and it is accompanied by penalties to prevent excessive use of activation forces. Therefore, the agent must learn how to propel itself forward effectively and efficiently.for further information on this environment, visit gymnasium.### Problem parameters:
n: number of body parts
mi: mass of part i (i ∈ {1…n})
li: length of part i (i ∈ {1…n})
k: viscous-friction coefficient### Action Space
Num = 0 : Torque applied on the first rotor (-1,1)
Num = 1 : Torque applied on the second rotor (-1,1)
### Observation Spaceqpos (3 elements by default): Position values of the robot’s body parts.
qvel (5 elements): The velocities of these individual body parts (their derivatives).## Agents