{"id":20325091,"url":"https://github.com/mphe/upside-down-reinforcement-learning","last_synced_at":"2025-09-23T17:56:18.710Z","repository":{"id":190664571,"uuid":"682076833","full_name":"mphe/upside-down-reinforcement-learning","owner":"mphe","description":"An extended Upside Down Reinforcement Learning implementation based on PyTorch and Stable Baselines 3.","archived":false,"fork":false,"pushed_at":"2024-08-04T16:07:22.000Z","size":116,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-25T15:49:04.264Z","etag":null,"topics":["baselines","down","learning","machine","pytorch","reinforcement","rl","stable","upside"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mphe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-23T11:43:15.000Z","updated_at":"2025-03-13T17:04:27.000Z","dependencies_parsed_at":"2024-08-04T18:11:31.962Z","dependency_job_id":"9b897ead-aafc-49a5-835a-890ba28ab94e","html_url":"https://github.com/mphe/upside-down-reinforcement-learning","commit_stats":null,"previous_names":["mphe/upside-down-reinforcement-learning"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mphe%2Fupside-down-reinforcement-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mphe%2Fupside-down-reinforcement-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mphe%2Fupside-down-reinforcement-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mphe%2Fupside-down-reinforcement-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mphe","download_url":"https://codeload.github.com/mphe/upside-down-reinforcement-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248468502,"owners_count":21108830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["baselines","down","learning","machine","pytorch","reinforcement","rl","stable","upside"],"created_at":"2024-11-14T19:38:41.287Z","updated_at":"2025-09-23T17:56:13.664Z","avatar_url":"https://github.com/mphe.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Upside Down Reinforcement Learning\n\nThis project implements *Upside Down Reinforcement Learning* using PyTorch and Stable Baselines 3.\nThe original algorithm, as presented in the paper, has been extended to support additional features,\nlike multi-threading or weighted replay buffer sampling.\nIt also provides an interface similar to Stable Baselines algorithms, so it can be used mostly\nanalogously.\n\nUDRL theory paper: \u003chttps://arxiv.org/pdf/1912.02875.pdf\u003e\u003cbr/\u003e\nUDRL implementation paper: \u003chttps://arxiv.org/pdf/1912.02877.pdf\u003e\n\n\n## Related Work\n\nVarious open source implementations already exist\n[[1](https://github.com/BY571/Upside-Down-Reinforcement-Learning)]\n[[2](https://github.com/drozzy/upsidedown)]\n[[3](https://jscriptcoder.github.io/upside-down-rl/Upside-Down_RL.html)]\n[[4](https://github.com/haron1100/Upside-Down-Reinforcement-Learning)]\n[[5](https://github.com/bprabhakar/upside-down-reinforcement-learning)]\n[[6](https://github.com/kage08/UDRL)]\n[[7](https://github.com/parthchadha/upsideDownRL)]\n[[8](https://github.com/AI-Core/Reinforcement-Learning)],\nbut most of them are difficult to extend and maintain, due to being written in a sloppy manner,\nor are incorrect, e.g. not using multiplicative interactions or contain smaller bugs and issues.\n\nThis project was initially based on [BY571's implementation](https://github.com/BY571/Upside-Down-Reinforcement-Learning),\nbut was rewritten from scratch to fix bugs, potentially improve performance, providing a proper OOP\ninterface, and reuse code from Stable Baselines 3 where applicable.\nFurthermore, the algorithm has been extended to support additional features, like multi-threading.\n\n\n## Setup\n\nInstall dependencies using `pip install -r requirements.txt`.\n\nFor examples, see below.\n\n## Action space\n\nOnly discrete action spaces are supported for now.\n\nThe code is mostly written to support other action spaces, especially since respective functionality\nfrom SB3 is used where applicable, but it needs more work and testing to make them behave correctly.\n\n\n## Features\n\nCNNs are technically supported, but do not work because of exploding gradients.\nContributions to fix them are welcome.\n\nCUDA is supported.\n\nAdditional features that were originally not included in the paper:\n\n- Multi-threading\n- Weighted replay buffer sampling - Useful for environments with vastly varying episode lengths\n- Seamless trajectory compression in memory - Useful for environments with very long episodes and large\n  observations, e.g. images\n- Evaluation on multiple episodes\n- Option to sample non-trailing trajectory slices for training\n\n\n## Examples\n\nSee `train_cartpole.py` and `train_pong.py` for a Cart Pole and Pong example.\n\nNote that Pong training will eventually get stuck because of CNNs being broken, as mentioned above.\n\nFor additional dependencies see here:\n* Cart Pole: \u003chttps://gymnasium.farama.org/environments/classic_control/\u003e\n* Pong: \u003chttps://gymnasium.farama.org/environments/atari/\u003e\n\n### Example Cart Pole Evaluation\n\n![](./cartpole_stats.png)\n\n\n## Contributing\n\nContributions are welcome, especially to make CNNs work.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmphe%2Fupside-down-reinforcement-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmphe%2Fupside-down-reinforcement-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmphe%2Fupside-down-reinforcement-learning/lists"}