{"id":26275343,"url":"https://github.com/luigimuratore/ctrlaltrover","last_synced_at":"2026-02-19T01:02:53.275Z","repository":{"id":274948705,"uuid":"924585364","full_name":"luigimuratore/CtrlAltRover","owner":"luigimuratore","description":"Reinforcement Learning and Sim-to-Real transfer for autonomous parking. Physical construction of a mobile robot to test the Sim-to-Real transfer. The goal of this physical robot is to implement a reinforcement learning algorithm to enable autonomous navigation and tasks such as parking.","archived":false,"fork":false,"pushed_at":"2025-03-12T15:19:02.000Z","size":73573,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"v1_BASE","last_synced_at":"2025-03-31T01:31:43.745Z","etag":null,"topics":["3d-printing","arduino-mega","autonomous-vehicles","dcmotor","hc-sr04","laser-cutting","lidar","ln298n","manipulator-robotics","mecanum-wheel","mujoco","proximal-policy-optimization","python","raspberry-pi","reinforcement-learning","ros","ros2-humble","webserver","yolo"],"latest_commit_sha":null,"homepage":"https://www.youtube.com/watch?v=acG5F-Vcsfw\u0026t=616s","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luigimuratore.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-30T09:37:53.000Z","updated_at":"2025-03-12T09:50:05.000Z","dependencies_parsed_at":"2025-03-12T10:37:47.392Z","dependency_job_id":"1c194f88-9cdf-4aaa-96a8-8a5f4c637782","html_url":"https://github.com/luigimuratore/CtrlAltRover","commit_stats":null,"previous_names":["luigimuratore/crtlaltrover","luigimuratore/ctrlaltrover"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luigimuratore%2FCtrlAltRover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luigimuratore%2FCtrlAltRover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luigimuratore%2FCtrlAltRover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luigimuratore%2FCtrlAltRover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luigimuratore","download_url":"https://codeload.github.com/luigimuratore/CtrlAltRover/tar.gz/refs/heads/v1_BASE","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252720291,"owners_count":21793737,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-printing","arduino-mega","autonomous-vehicles","dcmotor","hc-sr04","laser-cutting","lidar","ln298n","manipulator-robotics","mecanum-wheel","mujoco","proximal-policy-optimization","python","raspberry-pi","reinforcement-learning","ros","ros2-humble","webserver","yolo"],"created_at":"2025-03-14T10:15:34.013Z","updated_at":"2026-02-19T01:02:53.181Z","avatar_url":"https://github.com/luigimuratore.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Contributors](https://img.shields.io/github/contributors/luigimuratore/CtrlAltRover)](https://github.com/luigimuratore/CtrlAltRover/graphs/contributors)\n\n\n\u003cdiv align=\"center\"\u003e\n  \n# **Ctrl+Alt+Rover**  \n# v1 BASE\n\nYouTube video: https://youtu.be/acG5F-Vcsfw\n\n\n\n\u003cimg src=\"https://github.com/user-attachments/assets/bb571947-9537-4be8-b1b9-f51236d3c7e8\" width=\"258\" \u003e\n\n\u003cimg src=\"https://github.com/user-attachments/assets/8aec6359-d8e9-464a-b26d-df6d11f6b411\" width=\"250\" \u003e\n\n### Master's Degree in Mechatronic Engineer  \n\n\u003cimg src=\"https://github.com/luigimuratore/Fluid_Automation-CONVEYORS/assets/126814136/c104c1e7-fe39-4fee-b0c7-95fbba004564\" width=\"80\" \u003e\n\n# **Robot Learning**\n\n### Muratore Luigi\t\t\n### Gennero Giorgia\n\n--------------------\n\u003c/div\u003e\n\n\n## Description\n\nThe extension of the analysis involved the physical construction of a mobile robot to test the Sim-to-Real transfer. The robot was built entirely at home and features the following key components:\n\n- **Chassis and Wheels:**  \n  The robot’s structure is a rectangular wooden frame, laser-cut with a Cartesian robot. The wheels are Mecanum wheels, chosen for their omnidirectional capabilities, which allow the vehicle to move in any direction. Both the body and the rollers of the wheels were designed in Autodesk Inventor (see the CAD model image below) and 3D-printed in PLA before being assembled. The fully assembled rover is shown in the image following the description.\n\n- **Manipulator Arm:**  \n  A 5-degree-of-freedom manipulator arm is mounted on top of the chassis, further enhancing its functionality.\n\n- **Actuation:**  \n  The robot is powered by four DC motors controlled using two L298N drivers. A 12V power bank serves as the power source.\n\n- **Sensors:**  \n  The robot is equipped with:\n  - Four ultrasonic sensors for measuring distances, mounted on the four sides of the robot.\n  - Four infrared sensors mounted under the central body.\n  - A camera for potential vision-based tasks.\n  - A LiDAR for mapping and obstacle detection.\n\n- **Controller:**  \n  A Raspberry Pi 4b is used to control all functionalities, from running Python scripts to test each feature independently to using ROS2 Humble for handling sensors, motion control, and even loading trained models for real-world tests.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/user-attachments/assets/bb571947-9537-4be8-b1b9-f51236d3c7e8\" width=\"200\" \u003e\n\u003c/div\u003e\n\nThe goal of this physical robot is to implement a reinforcement learning algorithm to enable autonomous navigation and tasks such as parking. The robot must learn to navigate close to a wall on its right side and park inside a designated area. The initial requirement is that the robot must be positioned close to the wall and not too far from the parking slot.\n\n---\n\n## Mecanum Wheels Insight\n\nMecanum wheels allow omnidirectional motion, making them ideal for applications requiring high maneuverability. Each wheel has external rollers positioned at a 45° angle relative to the wheel’s axis. This configuration enables movement in any direction by varying the speed and rotation of each wheel. For example:\n\n- **Forward/Backward:**  \n  All wheels rotate at the same speed and in the same direction, so the longitudinal force vectors add up while the transverse vectors cancel out.\n\n- **Rotation:**  \n  Wheels on one side rotate in one direction while those on the opposite side rotate in reverse, generating a torque about the vertical axis.\n\n- **Sideways:**  \n  Diagonal wheels rotate in the same direction and the other diagonal wheels rotate oppositely, causing the transverse vectors to sum up while the longitudinal ones cancel.\n\nThis unique capability makes Mecanum wheels ideal for robotics and transfer vehicles where space and omnidirectional motion are critical.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/user-attachments/assets/3369ff62-d6d7-4eb0-a90a-8e0d2cfab36f\" width=\"250\" \u003e\n\u003c/div\u003e\n---\n\n## Simulation Environment – MuJoCo\n\nThe robot was simulated in the MuJoCo environment to test its behavior. The simulation involved creating a virtual model of the rover and its surroundings:\n\n- **Robot Model:**  \n  The rover is modeled with a central body and spherical wheels. To approximate the behavior of the Mecanum wheels (which are complex to model), spheres with two actuators (representing two rotational axes) were used. Gravity is incorporated into the model.\n\n- **Sensors:**  \n  Ultrasonic sensors are placed at the center of the lateral faces to measure distances.\n\n- **Environment:**  \n  The simulation environment consists of a flat floor and walls that define the parking area. Note that all dimensions in the simulation are scaled three times larger than the physical robot because of issues encountered in the MuJoCo environment.\n\nThe simulation is controlled via an XML file (`rover.xml`), which defines the robot's features and environment.\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/user-attachments/assets/50dda0cc-047a-402e-9561-840d371337ee\" width=\"250\" \u003e\n\u003c/div\u003e\n\nBelow are images showing the full simulation environment:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/user-attachments/assets/5014884e-936f-4dfb-93d4-2a9b04bb644f\" width=\"285\" \u003e\n\n\u003cimg src=\"https://github.com/user-attachments/assets/4f76b070-d651-4efa-8ed5-23cb9df78e8a\" width=\"250\" \u003e\n\u003c/div\u003e\n\nThe training and testing processes are managed by the scripts `train_rover.py` and `test_rover.py`, respectively.\n\n### Observation and Action Space\nThe observation is a vector including distances measured by the four ultrasonic sensors. For example:  \n\nobs = [distance_front, distance_rear, distance_left, distance_right]\n\n\n- **Action Space:**  \nThe action space is simplified for the parking task. The available actions are described in the table below.\n\n| Action | Description  |\n| ------ | ------------ |\n| 0      | Move forward |\n| 1      | Move right   |\n| 2      | Stay still   |\n\n---\n\n## Training\n\nThe **Proximal Policy Optimization (PPO)** algorithm is used for training. PPO improves the expected reward while ensuring stable policy updates by using a clipped objective function that prevents excessively large updates. To enhance exploration, an **epsilon-greedy policy** strategy is integrated. The agent selects a random action with probability (1 – ε) and follows the best-known action with probability ε. This trade-off between exploration and exploitation allows the agent to experience a variety of situations, leading to a more robust policy.\n\nThe following hyperparameters were used during training:\n- **Learning rate (α):** 3 × 10⁻⁴\n- **Clipping parameter (ε):** 0.2\n- **Discount factor (γ):** 0.98\n- **Batch size:** 64\n- **Entropy coefficient:** 0.05\n- **Number of steps:** 1024\n\n---\n\n## Reward Function\n\nIn training, a form of **curriculum learning** was implemented to gradually introduce more complex tasks to the agent. Initially, the reward function encouraged simple behaviors (e.g., moving forward) by providing high rewards for advancing. Once this skill was learned, more challenging actions (such as translating right or stopping) were gradually introduced.\n\nKey aspects of the reward function:\n- A positive reward is given when the robot moves right into the parking slot (i.e., first moving forward, then translating right).\n- A negative reward is given if the robot touches the walls, remains stationary when it should move forward, or continues moving forward when it should translate right.\n- A penalty is applied for any incorrect action.\n\n### Mathematical Description\n\nThe reward function is defined as follows:\n\nr = { 1 - 0.2 * a + malus, if not park { 0 + 𝟙(a ≠ 0) * (1001 - a^6) + malus, if park\n\nwhere the malus is defined by:\n\nmalus = { -3, if crash and |d_right| \u003c d_safe { 0, otherwise\n\n\nwith:\n- `park = True` if |d_right| \u003e 2\n- `crash = True` if |d_right| \u003c d_safe and not park\n\nThe termination condition is:\n\ndone = True if current_step ≥ max_steps\n\n\nHere, 𝟙(·) is the indicator function that returns 1 when the condition is true and 0 otherwise, and **a** represents the action index taken by the agent.\n\n### Reward Algorithm Pseudocode\n\n```plaintext\nreward ← 1\ncrash ← False\n\nif distance_right \u003c safe_distance then\n    crash ← True\n\nif not park then\n    reward ← reward - 0.2 × action\n    if distance_right \u003e 2 then\n         park ← True\n    end if\nend if\n\nif park then\n    crash ← False\n    if action = 0 then\n         reward ← reward - 1\n    else\n         reward ← reward + 1000 - (action^6)\n    end if\nend if\n\nif crash and |distance_right| \u003c safe_distance then\n    reward ← reward - 3\nend if\n\ndone ← (current_step ≥ max_steps)\n```\n\n----------------------------------\n\n## TESTING\n\nThe simplified action space eases policy learning by reducing the number of decisions the agent must consider. However, it also introduces limitations. For instance, if the rover becomes stuck (e.g., in a corner), it cannot recover by translating sideways due to the restricted set of actions.\n\nDuring testing, it was observed that while the rover often reached the parking area, it sometimes approached the wall too closely or got stuck at the corners. Despite these issues, when not hindered by these failure cases, the rover successfully reached its target position.\n\nThe trained model used for testing is stored as `parking.zip`.\n\n----------------------------------\n\n## Real Rover Performance\nThe best-trained model was transferred to the Raspberry Pi environment for testing on the real rover. Although deploying an unstable policy in a real-world setting is generally not recommended, the simplicity of the actions and the non-hazardous nature of the tests allowed for real-world trials.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/8b12c8fe-c3d5-47ee-9abd-8a146f47e1d0\" width=\"330\" \u003e\n  \n  \u003cimg src=\"https://github.com/user-attachments/assets/827f0028-3ad1-4366-b9fa-303d37b6705a\" width=\"250\" \u003e\n\u003c/div\u003e\n\nKey points during real-world testing:\n\nHardware and Software Challenges:\nSome issues arose due to hardware compatibility and software integration.\nEnvironmental Discrepancies:\nThe simulated environment (scaled larger due to MuJoCo issues) did not perfectly match the real-world setup. For instance, while the rover’s mass was accurately derived from the CAD model, friction was not modeled in simulation because the 3D-printed wheels were coated with a rubber-like material—making it hard to compute an exact friction coefficient.\nPerformance:\nMultiple tests were performed with varied initial positions and parking area sizes. Although the rover generally recognized the target parking area, it often failed to enter the slot correctly, sometimes getting stuck near the walls.\n\n----------------------------------\n\n\n## Conclusion and Future Works\nThis project explored the challenges and solutions of sim-to-real transfer in reinforcement learning using both a simulated Hopper environment and a custom-built rover. By applying Proximal Policy Optimization (PPO) and Uniform Domain Randomization, the study demonstrated the importance of robust training methods to bridge the gap between simulation and reality.\n\nKey takeaways include:\n\nThe physical robot showcases the potential of RL in real-world tasks such as autonomous navigation and parking.\nCurriculum learning and a carefully designed reward function enabled incremental learning of complex behaviors.\nThe current policy, although promising, suffers from instability and limited recovery options when encountering unfavorable states.\nFuture work will focus on:\n\nExpanding the action space to enable more nuanced decision-making and recovery strategies.\nIntegrating motor encoder data to improve positional accuracy.\nUtilizing the onboard infrared sensors for smoother navigation.\nIncorporating the front-mounted camera with computer vision algorithms (e.g., object recognition) to assist in various tasks.\nThis project underscores the potential of reinforcement learning and sim-to-real transfer in robotics while highlighting areas for further improvement and research.\n\n\n----------------------------------\n----------------------------------\n\n# Work in progress\n\n## API to control the robot via server:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://github.com/user-attachments/assets/5d36e8cd-0a53-47b1-a3dc-7b7b7b9d58ee\" width=\"500\"/\u003e\n\u003c/div\u003e\n\nIn the `server.py` is ...\n\n----------------------------------\n\n## LiDAR\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"media/lidar.png\" width=\"500\"/\u003e\n\u003c/div\u003e\n\n----------------------------------\n\n## Object recognition with YOLO\n\nImplementing YOLO algorithm to ...\n\n----------------------------------\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluigimuratore%2Fctrlaltrover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluigimuratore%2Fctrlaltrover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluigimuratore%2Fctrlaltrover/lists"}