Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kylesayrs/drivingrl
Basic 2D car environment trained using reinforcement learning within the Stable Baselines 3 framework
https://github.com/kylesayrs/drivingrl
proximal-policy-optimization reinforcement-learning
Last synced: 2 days ago
JSON representation
Basic 2D car environment trained using reinforcement learning within the Stable Baselines 3 framework
- Host: GitHub
- URL: https://github.com/kylesayrs/drivingrl
- Owner: kylesayrs
- Created: 2023-05-12T19:47:52.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-04-25T03:31:59.000Z (9 months ago)
- Last Synced: 2024-04-25T04:33:03.028Z (9 months ago)
- Topics: proximal-policy-optimization, reinforcement-learning
- Language: Python
- Homepage:
- Size: 459 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Driving Reinforcement Learning #
This repo implements a basic driving environment and driving agent trained using reinforcement learning.
## Driving Environment ##
The environment has three kinds of bodies: the car, obstacles, and the goal. When the car touches an obstacle, the agent receives a negative penalty. When the car touches the goal, the agent receives a positive reward. The gamma parameter controls how much the agent should care about future rewards relative to immediate rewards. All three objects are placed randomly within the scene and an obstacle border is placed around the scene to prevent the vehicle from going out of bounds.## Observations ##
The agent observes the following dictionary of values at each time step:
| Observation | Significance | Format |
| ----------- | ------------ | ------ |
| Car Velocity | Speed of car in X and Y axes | (velocity_x, velocity_y) |
| Car Angle | Angle wich car is facing | (cos(theta), sin(theta)) |
| Car Angle Velocity | Speed at which car angle is changing | (theta, ) |
| Car Visual | Distance from car to obstacle along each visual ray | (dist0, dist1, ...) |
| Goal Angle | Angle between car angle and goal | (cos(theta), sin(theta)) |
| Goal Distance | Distance from car to goal | (distance, ) |## Driving Agent ##
The agent was trained using proximal policy optimization, which computes the advantage of a given policy against baselines rewards in order to compute a gradient with which to train the policy network. Given that the agent performs actions in a continuous action space, the deep deterministic policy gradient (DDPG) was also evaluated but found to be less sample efficient than PPO.
At each time step the agent infers the following actions:
| Observation | Significance | Format |
| ----------- | ------------ | ------ |
| Forward Acceleration | Change in car velocity in the forward direction, can be negative | (acceleration, ) |
| Angular Acceleration | Change in car angle relative to current angle | (angle_acceleration, ) |## Lessons learned ##
During the experimentation with the car demo environment, the PPO reinforcement learning agent was trained to navigate towards the goal while avoiding obstacles. However, an unexpected behavior was observed where the trained agent consistently spun in circles instead of moving directly towards the target. Upon investigation, it was discovered that the absence of tire friction in the system, coupled with the agent's limited ability to measure only forward velocity, led to the suboptimal strategy of spinning to perceive velocity from all directions before potentially colliding with an obstacle. This issue was resolved by providing the agent with global velocity measurements, which enabled it to adopt a more efficient and direct path towards the target.This experience highlights the intelligence of reinforcement learning agents, as they can sometimes uncover aspects of the environment that are not immediately apparent to the designer. It is therefore important to carefully design the environment and provide the agent with the necessary information to make informed decisions.