{"id":17361018,"url":"https://github.com/diegoferigo/phd-thesis","last_synced_at":"2025-08-09T23:37:09.547Z","repository":{"id":172750349,"uuid":"567834859","full_name":"diegoferigo/phd-thesis","owner":"diegoferigo","description":"Simulation Architectures for Reinforcement Learning applied to Robotics","archived":false,"fork":false,"pushed_at":"2024-09-04T07:59:13.000Z","size":3286,"stargazers_count":9,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"overleaf","last_synced_at":"2025-04-15T00:34:00.310Z","etag":null,"topics":["algorithms","latex","manchester","phd","phd-thesis","reinforcement-learning","rigid-body-dynamics","robotics","simulations","synthetic-data","thesis","university"],"latest_commit_sha":null,"homepage":"http://diegoferigo.github.io/phd-thesis","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/diegoferigo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-18T17:35:06.000Z","updated_at":"2024-12-22T20:27:25.000Z","dependencies_parsed_at":"2025-04-15T00:30:16.974Z","dependency_job_id":"6d52964e-1209-417f-bc19-713a5faa82d0","html_url":"https://github.com/diegoferigo/phd-thesis","commit_stats":null,"previous_names":["diegoferigo/phd-thesis"],"tags_count":4,"template":false,"template_full_name":"diegoferigo/classicthesis-uom","purl":"pkg:github/diegoferigo/phd-thesis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diegoferigo%2Fphd-thesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diegoferigo%2Fphd-thesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diegoferigo%2Fphd-thesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diegoferigo%2Fphd-thesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/diegoferigo","download_url":"https://codeload.github.com/diegoferigo/phd-thesis/tar.gz/refs/heads/overleaf","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diegoferigo%2Fphd-thesis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269654118,"owners_count":24454318,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-09T02:00:10.424Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","latex","manchester","phd","phd-thesis","reinforcement-learning","rigid-body-dynamics","robotics","simulations","synthetic-data","thesis","university"],"created_at":"2024-10-15T19:29:48.304Z","updated_at":"2025-08-09T23:37:09.518Z","avatar_url":"https://github.com/diegoferigo.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\nSimulation Architectures\u003cbr\u003e\nfor\u003cbr\u003e\nReinforcement Learning applied to Robotics\n\u003c/h1\u003e\n\n\u003ch3 align=\"center\"\u003eUniversity of Manchester\u003c/h3\u003e\n\u003ch4 align=\"center\"\u003e2022\u003c/h4\u003e\n\n\u003cp align=\"center\" \u003e\n\u003ca href=\"https://github.com/diegoferigo/phd-thesis/releases/latest/download/thesis.pdf\"\u003e\n\u003cimg alt=\"phd_thesis_pdf\" src=\"https://img.shields.io/badge/PDF-phd_thesis-D21517.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAGXRFWHRTb2Z0d2FyZQB3d3cuaW5rc2NhcGUub3Jnm+48GgAABCBJREFUeJztml9oHEUcx78ze7eXs9ek6YUYa7Vp6LWIUDCtfbC0BgPxQQRLIcRSCIJQEN+qb2IfVAhi82BBqyC+WKQU2vgntvSMolJFI/VPaUoamqR48bpJLm2aP7fe7cz4cPZiyV1625nxLmY+b3vz2+/87sPu3uzuAQaDwWAwGAwrFKIj1HHGdhKCXYQgJJuVTDpfb93a/K2KvgqhXMDEROKoEDigKm9oaCTtuulnWlvbzqrK/DdUZZjj/LFX5ZcHACFEmHP09PWdbVOZewulAgghT6nMu4VOCUoFAFilOC+PLgmqBWhFh4RlJQBQL2HZCQDUSliWAgB1EipeACHFlyoqJFS8gEhk6R8WWQkVL6C+PopwuGrJGhkJSpfC4+OJ4wDaVWYCAOccU1M3MD+fXrKOEJKurl61MxZ7+JdSswPS3f0HUEpRV7e2lNIwgE0AShZQ8aeAboyAcjdQboyAcjdQboyAcjdQbnyvA663tDQS294OIRYtovjpeIOorXFVNCYoBd+yOYiaaktFXjF8Cbje0tJoBYMXAERQ4CaFHvtYVV85Vkc8t/tNhnCVNgm+ToGAbW8HIRFdzSxiZjZA/0xmdU7h7xpQ4LDXjRBc65wr/iK44gVI3Q3ec6QLdMMDuQ3OwYdH4b77IfjQMKwdzQi/cvCfMQaedOD1fYfMqV6Asdz+Rw+Drmu4LdPrPw/3tcMybflC7nbYouDJa8j2xkGqQgg+/STCh17GXOeLIISABCxker4ATzqwYk2wn3sWdMsmuG90AwCIRcETY8j2xvOR3JmQaskv0s8DhDOB7Kdnchu2jdC+vSBra/Pj3rkfwS4OIgvAnpxCqGMPMg9tBr90Obf/+CSyZ75aCORctiVfSAug998Hu7MjdwS07gZPOhCTKaBpw6Ja74d+hDr2wGp8MC8gsKMZqz87lq9Jv94N7/ufZNsqGWkBJFqLYOsugHN4Fy7hrw8+AoQoWGttzEnhqan8Z+zKKDKfnF7YHhqWbckX0gLY7wNIv9pVdJw03As6l4YVa0Lo+f1gVxNg53/Lj4vJFLwvv5Ft467R/kwwfPAFAIAQAqz/V7hH3gc8pnvakpES4L71DuB5Bce8gUHMv3QIACAYg7g2DjF987aadNfbQCYj04I0UgL48GjxwZlZsIHBpfe/MiIzvRJW/ErQl4DKOXPV4fcIGIEo8hunA0o5olGtR6mva8CaePznmba2dgjxOAq8VmPbHnkCNTUbVTTGKQF/dJslatcEVeQVY1m8G/RJe339+hOlFpuLoOK8OcV5d4GY9VOtVACl9HOVef4RN6an3XN+9lAqoK5u3Ukh8J7KTB/Mcy46Y7HYzTuXLqDlgWMqNfYYY3w3IVT6z9J3gnMOQmjKskRPNLo+oXs+g8FgMBgMhv8NfwMqrogtIDJkNgAAAABJRU5ErkJggg==\" /\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv style=\"text-align: justify; text-justify: inter-word;\"\u003e\n\u003cdetails\u003e\n\u003csummary\u003eAbstract\u003c/summary\u003e\n\n## Abstract\n\n\u003cp align=\"justify\"\u003e\nThere is no doubt that we are living in the age of data.\nIn the last two decades, the scientific community has been able to produce systems with superhuman capabilities through the combination of modern hardware advancements, novel learning algorithms and architectures, and advances in software frameworks.\nSuch progress revolutionised domains like computer vision and language processing, showing performance previously out of reach.\nOne may think that results could transfer straightforwardly to other fields like robotics until realising the existence of domain-specific characteristics and limitations hindering the potential of these learning methods.\nGenerating enough data from real-world robots is often too expensive or not even possible to the desired scale.\nData sampled from robots has a sequential nature, and not all families of learning algorithms are effective in this context.\nFurthermore, most algorithms that excel in this sequential setting, such as those belonging to the Reinforcement Learning (RL) family, learn by a trial-and-error process, which could lead to trajectories that damage either the robots or their surroundings.\n\u003c/p\u003e\n\n\u003cp align=\"justify\"\u003e\nIn this thesis, we attempt to answer the question,\n\u003ci\u003e\"How can modern technology help us generate synthetic data for humanoid robot planning and control?\"\u003c/i\u003e.\n\u003c/p\u003e\n\n\u003cp align=\"justify\"\u003e\nMotivated by the advancements in hardware accelerators that are revolutionising scientific computing, we limit our analysis to the simulation realm.\nIn this context, we first introduce a software architecture allowing to structure learning environments for robotics that can be adopted to train and run RL policies regardless of the simulated or real-world setting.\nWith its underlying simulation technology and exploiting a scheme based on reward shaping, we validate the architecture by\ntraining with RL a push-recovery controller capable of synthesising whole-body references for the humanoid robot iCub.\nThen, motivated by overcoming the bottlenecks related to the poor sampling performance of traditional rigid-body simulators, we present a new physics engine in reduced coordinates that can simulate robots interacting with a ground surface on hardware accelerators like GPUs and TPUs.\nTo this end, we present a contact-aware continuous state-space representation describing the dynamical evolution of floating-base robots that can be numerically integrated for simulation purposes.\nWe adopt the new general-purpose Gazebo Sim simulator as our first solution to sample synthetic data, and exploit JAX and its hardware support to scale the sampling performance for highly parallel problems.\nFurthermore, we implement and benchmark common Rigid Body Dynamics Algorithms part of the proposed physics engine on hardware accelerators and assess their scalability properties on different GPUs.\nThese pieces of technology help to lower the computational barriers that nowadays are still among the main bottlenecks for obtaining intelligent agents, democratising the applicability of this family of learning-based methods.\n\u003c/p\u003e\n\n\u003c/details\u003e\n\u003c/div\u003e\n\n## Citing\n\n```bib\n@phdthesis{ferigo_phd_thesis_2022,\n  title = {Simulation Architectures for Reinforcement Learning applied to Robotics},\n  author = {Ferigo, Diego},\n  school = {University of Manchester},\n  type = {PhD Thesis},\n  month = {July},\n  year = {2022},\n  url = {https://github.com/diegoferigo/phd-thesis/releases/latest/download/thesis.pdf},\n}\n```\n\n## Contributing\n\nFor any doubt or to report an error, please [open an issue][new_issue].\n\nIf you want to fix the document yourself, please open a PR against the `main` branch (see [branching](#branching) details below).\nThe Continuous Integration pipeline implemented in this repository will compile the LaTeX sources with your contribution\nand upload the PDF document as artifact of the workflow for inspection.\n\n[new_issue]: https://github.com/diegoferigo/phd-thesis/issues/new\n\n## Branching\n\nThis repository has two branches:\n\n- **`overleaf`** is the branch connected to my personal Overleaf project.\n- **`main`** is the branch associated to external contributions and releases.\n\nThe Overleaf Git system [does not currently support branching][overleaf_no_branching].\nFor this reason, I cannot select **`main`** as default branch of the repository, even if it is.\n\n**If you want to contribute with a new PR, please target the `main` branch.**\n\n[overleaf_no_branching]: https://www.overleaf.com/learn/how-to/Using_Git_and_GitHub#Known_Limitations\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiegoferigo%2Fphd-thesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiegoferigo%2Fphd-thesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiegoferigo%2Fphd-thesis/lists"}