{"id":13595006,"url":"https://github.com/Eclectic-Sheep/sheeprl","last_synced_at":"2025-04-09T10:32:38.543Z","repository":{"id":166139580,"uuid":"641586215","full_name":"Eclectic-Sheep/sheeprl","owner":"Eclectic-Sheep","description":"Distributed Reinforcement Learning accelerated by Lightning Fabric","archived":false,"fork":false,"pushed_at":"2024-05-01T20:09:26.000Z","size":22842,"stargazers_count":240,"open_issues_count":17,"forks_count":19,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-05-02T04:55:05.258Z","etag":null,"topics":["distributed","lightning","pytorch","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://eclecticsheep.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Eclectic-Sheep.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-16T19:33:30.000Z","updated_at":"2024-05-04T20:37:54.020Z","dependencies_parsed_at":"2023-10-19T16:36:24.112Z","dependency_job_id":"e18d3a16-f387-4bb1-b08c-9a1ae7ef61d7","html_url":"https://github.com/Eclectic-Sheep/sheeprl","commit_stats":null,"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eclectic-Sheep%2Fsheeprl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eclectic-Sheep%2Fsheeprl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eclectic-Sheep%2Fsheeprl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Eclectic-Sheep%2Fsheeprl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Eclectic-Sheep","download_url":"https://codeload.github.com/Eclectic-Sheep/sheeprl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248020593,"owners_count":21034459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed","lightning","pytorch","reinforcement-learning"],"created_at":"2024-08-01T16:01:42.293Z","updated_at":"2025-04-09T10:32:33.530Z","avatar_url":"https://github.com/Eclectic-Sheep.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# ⚡ SheepRL 🐑\n\n[![Python 3.8](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)\n[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)\n[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-3100/)\n[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/images/logo.svg\" style=\"width:40%\"\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ctable\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e\u003cimg src=\"https://github.com/Eclectic-Sheep/sheeprl/assets/18405289/6efd09f0-df91-4da0-971d-92e0213b8835\" width=\"200px\"\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cimg src=\"https://github.com/Eclectic-Sheep/sheeprl/assets/18405289/dbba57db-6ef5-4db4-9c53-d7b5f303033a\" width=\"200px\"\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cimg src=\"https://github.com/Eclectic-Sheep/sheeprl/assets/18405289/3f38e5eb-aadd-4402-a698-695d1f99c048\" width=\"200px\"\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cimg src=\"https://github.com/Eclectic-Sheep/sheeprl/assets/18405289/93749119-fe61-44f1-94bb-fdb89c1869b5\" width=\"200px\"\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ctable\u003e\n    \u003cthead\u003e\n      \u003ctr\u003e\n        \u003cth\u003eEnvironment\u003c/th\u003e\n        \u003cth\u003eTotal frames\u003c/th\u003e\n        \u003cth\u003eTraining time\u003c/th\u003e\n        \u003cth\u003eTest reward\u003c/th\u003e\n        \u003cth\u003ePaper reward\u003c/th\u003e\n        \u003cth\u003eGPUs\u003c/th\u003e\n      \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctbody\u003e\n      \u003ctr\u003e\n        \u003ctd\u003eCrafter\u003c/td\u003e\n        \u003ctd\u003e1M\u003c/td\u003e\n        \u003ctd\u003e1d 3h\u003c/td\u003e\n        \u003ctd\u003e12.1\u003c/td\u003e\n        \u003ctd\u003e11.7\u003c/td\u003e\n        \u003ctd\u003e1-V100\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003eAtari-MsPacman\u003c/td\u003e\n        \u003ctd\u003e100K\u003c/td\u003e\n        \u003ctd\u003e14h\u003c/td\u003e\n        \u003ctd\u003e1542\u003c/td\u003e\n        \u003ctd\u003e1327\u003c/td\u003e\n        \u003ctd\u003e1-3080\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e Atari-Boxing\u003c/td\u003e\n        \u003ctd\u003e100K\u003c/td\u003e\n        \u003ctd\u003e14h\u003c/td\u003e\n        \u003ctd\u003e84\u003c/td\u003e\n        \u003ctd\u003e78\u003c/td\u003e\n        \u003ctd\u003e1-3080\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003eDOA++(w/o optimizations)\u003csup\u003e1\u003c/sup\u003e\u003c/td\u003e\n        \u003ctd\u003e7M\u003c/td\u003e\n        \u003ctd\u003e18d 22h\u003c/td\u003e\n        \u003ctd\u003e2726/3328\u003csup\u003e2\u003c/sup\u003e\u003c/td\u003e\n        \u003ctd\u003eN.A.\u003c/td\u003e\n        \u003ctd\u003e1-3080\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003eMinecraft-Nav(w/o optimizations)\u003c/td\u003e\n        \u003ctd\u003e8M\u003c/td\u003e\n        \u003ctd\u003e16d 4h\u003c/td\u003e\n        \u003ctd\u003e27% \u0026gt;= 70\u003cbr\u003e14% \u0026gt;= 100\u003c/td\u003e\n        \u003ctd\u003eN.A.\u003c/td\u003e\n        \u003ctd\u003e1-V100\u003c/td\u003e\n      \u003c/tr\u003e\n    \u003c/tbody\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\n1. For comparison: 1M in 2d 7h vs 1M in 1d 5h (before and after optimizations resp.)\n2. Best [leaderboard score in DIAMBRA](https://diambra.ai/leaderboard) (11/7/2023)\n\n#### Benchmarks\nThe training times of our implementations compared to the ones of Stable Baselines3 are shown below:\n\n\u003cdiv align=\"center\"\u003e\n  \u003ctable\u003e\n    \u003cthead\u003e\n      \u003ctr\u003e\n        \u003cth colspan=\"2\"\u003e\u003c/th\u003e\n        \u003cth\u003eSheepRL v0.4.0\u003c/th\u003e\n        \u003cth\u003eSheepRL v0.4.9\u003c/th\u003e\n        \u003cth\u003eSheepRL v0.5.2\u003cbr /\u003e(Numpy Buffers)\u003c/th\u003e\n        \u003cth\u003eSheepRL v0.5.5\u003cbr /\u003e(Numpy Buffers)\u003c/th\u003e\n        \u003cth\u003eStableBaselines3\u003csup\u003e1\u003c/sup\u003e\u003c/th\u003e\n      \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctbody\u003e\n      \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003e\u003cb\u003ePPO\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003ci\u003e1 device\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e192.31s \u0026plusmn; 1.11\u003c/td\u003e\n        \u003ctd\u003e138.3s \u0026plusmn; 0.16\u003c/td\u003e\n        \u003ctd\u003e80.81s \u0026plusmn; 0.68\u003c/td\u003e\n        \u003ctd\u003e81.27s \u0026plusmn; 0.47\u003c/td\u003e\n        \u003ctd\u003e77.21s \u0026plusmn; 0.36\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e\u003ci\u003e2 devices\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e85.42s \u0026plusmn; 2.27\u003c/td\u003e\n        \u003ctd\u003e59.53s \u0026plusmn; 0.78\u003c/td\u003e\n        \u003ctd\u003e46.09s \u0026plusmn; 0.59\u003c/td\u003e\n        \u003ctd\u003e36.88s \u0026plusmn; 0.30\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003e\u003cb\u003eA2C\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003ci\u003e1 device\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003e84.76s \u0026plusmn; 0.37\u003c/td\u003e\n        \u003ctd\u003e84.22s \u0026plusmn; 0.99\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e\u003ci\u003e2 devices\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003e28.95s \u0026plusmn; 0.75\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd rowspan=\"2\"\u003e\u003cb\u003eSAC\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003ci\u003e1 device\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e421.37s \u0026plusmn; 5.27\u003c/td\u003e\n        \u003ctd\u003e363.74s \u0026plusmn; 3.44\u003c/td\u003e\n        \u003ctd\u003e318.06s \u0026plusmn; 4.46\u003c/td\u003e\n        \u003ctd\u003e320.21 \u0026plusmn; 6.29\u003c/td\u003e\n        \u003ctd\u003e336.06s \u0026plusmn; 12.26\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e\u003ci\u003e2 devices\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e264.29s \u0026plusmn; 1.81\u003c/td\u003e\n        \u003ctd\u003e238.88s \u0026plusmn; 4.97\u003c/td\u003e\n        \u003ctd\u003e210.07s \u0026plusmn; 27\u003c/td\u003e\n        \u003ctd\u003e225.95 \u0026plusmn; 3.65\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e\u003cb\u003eDreamer V1\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003ci\u003e1 device\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e4201.23s\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003e2921.38s\u003c/td\u003e\n        \u003ctd\u003e2207.13s\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e\u003cb\u003eDreamer V2\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003ci\u003e1 device\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e1874.62s\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003e1148.1s\u003c/td\u003e\n        \u003ctd\u003e906.42s\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n      \u003c/tr\u003e\n      \u003ctr\u003e\n        \u003ctd\u003e\u003cb\u003eDreamer V3\u003c/b\u003e\u003c/td\u003e\n        \u003ctd\u003e\u003ci\u003e1 device\u003c/i\u003e\u003c/td\u003e\n        \u003ctd\u003e2022.99s\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n        \u003ctd\u003e1378.01s\u003c/td\u003e\n        \u003ctd\u003e1589.30s\u003c/td\u003e\n        \u003ctd\u003eN.D.\u003c/td\u003e\n      \u003c/tr\u003e\n    \u003c/tbody\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\n\u003e [!NOTE]\n\u003e\n\u003e All experiments have been run on 4 CPUs in [Lightning Studio](https://lightning.ai/).\n\u003e All benchmarks, but the Dreamers' ones, have been run 5 times and we have taken the mean and the std of the runs. \n\u003e We have disabled the test function, the logging, and the checkpoints. Moreover, the models were not registered using MLFlow.\n\u003e \n\u003e Dreamers' benchmarks have been run 1 time with logging and checkpoints, without running the test function.\n\u003e\n\u003e 1. The StableBaselines3 version is `v2.2.1`, please install the package with `pip install stable-baselines3==2.2.1`\n\n## What\n\nAn easy-to-use framework for reinforcement learning in PyTorch, accelerated with [Lightning Fabric](https://lightning.ai/docs/fabric/stable/).  \nThe algorithms sheeped by sheeprl out-of-the-box are:\n\n| Algorithm                 | Coupled            | Decoupled          | Recurrent          | Vector obs         | Pixel obs          | Status             |\n| ------------------------- | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ | ------------------ |\n| A2C                       | :heavy_check_mark: | :x:                | :x:                | :heavy_check_mark: | :x:                | :heavy_check_mark: |\n| A3C                       | :heavy_check_mark: | :x:                | :x:                | :heavy_check_mark: | :x:                | :construction:     |\n| PPO                       | :heavy_check_mark: | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO Recurrent             | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| SAC                       | :heavy_check_mark: | :heavy_check_mark: | :x:                | :heavy_check_mark: | :x:                | :heavy_check_mark: |\n| SAC-AE                    | :heavy_check_mark: | :x:                | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| DroQ                      | :heavy_check_mark: | :x:                | :x:                | :heavy_check_mark: | :x:                | :heavy_check_mark: |\n| Dreamer-V1                | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Dreamer-V2                | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Dreamer-V3                | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Plan2Explore (Dreamer V1) | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Plan2Explore (Dreamer V2) | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Plan2Explore (Dreamer V3) | :heavy_check_mark: | :x:                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\nand more are coming soon! [Open a PR](https://github.com/Eclectic-Sheep/sheeprl/pulls) if you have any particular request :sheep:\n\n\nThe actions supported by sheeprl agents are:\n| Algorithm                 | Continuous         | Discrete           | Multi-Discrete     |\n| ------------------------- | ------------------ | ------------------ | ------------------ |\n| A2C                       | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| A3C                       | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO                       | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| PPO Recurrent             | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| SAC                       | :heavy_check_mark: | :x:                | :x:                |\n| SAC-AE                    | :heavy_check_mark: | :x:                | :x:                |\n| DroQ                      | :heavy_check_mark: | :x:                | :x:                |\n| Dreamer-V1                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Dreamer-V2                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Dreamer-V3                | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Plan2Explore (Dreamer V1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Plan2Explore (Dreamer V2) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n| Plan2Explore (Dreamer V3) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |\n\nThe environments supported by sheeprl are:\n| Algorithm          | Installation command         | More info                                       | Status             |\n| ------------------ | ---------------------------- | ----------------------------------------------- | ------------------ |\n| Classic Control    | `pip install sheeprl`           |                                                 | :heavy_check_mark: |\n| Box2D              | `pip install sheeprl[box2d]`    | Please install first `swig` with `pip install swig` | :heavy_check_mark: |\n| Mujoco (Gymnasium) | `pip install sheeprl[mujoco]`   | [how_to/mujoco](./howto/learn_in_dmc.md)        | :heavy_check_mark: |\n| Atari              | `pip install sheeprl[atari]`    | [how_to/atari](./howto/learn_in_atari.md)       | :heavy_check_mark: |\n| DeepMind Control   | `pip install sheeprl[dmc]`      | [how_to/dmc](./howto/learn_in_dmc.md)           | :heavy_check_mark: |\n| MineRL             | `pip install sheeprl[minerl]`   | [how_to/minerl](./howto/learn_in_minerl.md)     | :heavy_check_mark: |\n| MineDojo           | `pip install sheeprl[minedojo]` | [how_to/minedojo](./howto/learn_in_minedojo.md) | :heavy_check_mark: |\n| DIAMBRA            | `pip install sheeprl[diambra]`  | [how_to/diambra](./howto/learn_in_diambra.md)   | :heavy_check_mark: |\n| Crafter            | `pip install sheeprl[crafter]`  | https://github.com/danijar/crafter              | :heavy_check_mark: |\n| Super Mario Bros   | `pip install sheeprl[supermario]` | https://github.com/Kautenja/gym-super-mario-bros/tree/master | :heavy_check_mark: |\n\n\n## Why\n\nWe want to provide a framework for RL algorithms that is at the same time simple and scalable thanks to Lightning Fabric.\n\nMoreover, in many RL repositories, the RL algorithm is tightly coupled with the environment, making it harder to extend them beyond the gym interface. We want to provide a framework that allows to easily decouple the RL algorithm from the environment, so that it can be used with any environment.\n\n## How to use it\n\n### Installation\n\nThree options exist for installing SheepRL\n\n1. Install the latest version directly from the [PyPi index](https://pypi.org/project/sheeprl/)\n2. Clone the repo and install the local version\n3. pip-install the framework using the GitHub clone URL\n\nInstructions for the three methods are shown below.\n\n#### Install SheepRL from PyPi\n\n\nYou can install the latest version of SheepRL with\n\n```bash\npip install sheeprl\n```\n\n\u003e [!NOTE]\n\u003e \n\u003e To install optional dependencies one can run for example `pip install sheeprl[atari,box2d,dev,mujoco,test]`\n\nFor a detailed information about all the optional dependencies you can install please have a look at the [What](#what) section\n\n#### Cloning and installing a local version\n\nFirst, clone the repo with:\n\n```bash\ngit clone https://github.com/Eclectic-Sheep/sheeprl.git\ncd sheeprl\n```\n\nFrom inside the newly created folder run\n\n```bash\npip install .\n```\n\n\u003e [!NOTE]\n\u003e \n\u003e To install optional dependencies one can run for example `pip install .[atari,box2d,dev,mujoco,test]`\n\n#### Installing the framework from the GitHub repo\n\nIf you haven't already done so, create an environment with your choice of venv or conda.\n\n\u003e The example will use Python standard's venv module and assumes macOS or Linux.\n\n```sh\n# create a virtual environment\npython3 -m venv .venv\n\n# activate the environment\nsource .venv/bin/activate\n\n# if you do not wish to install extras such as mujuco, atari do\npip install \"sheeprl @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install with atari and mujuco environment support, do\npip install \"sheeprl[atari,mujoco,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install with box2d environment support, do\npip install swig\npip install \"sheeprl[box2d] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install with minedojo environment support, do\npip install \"sheeprl[minedojo,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install with minerl environment support, do\npip install \"sheeprl[minerl,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install with diambra environment support, do\npip install \"sheeprl[diambra,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install with super mario bros environment support, do\npip install \"sheeprl[supermario,dev] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n\n# or, to install all extras, do\npip install swig\npip install \"sheeprl[box2d,atari,mujoco,minerl,supermario,dev,test] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n```\n\n#### Additional: installing on an M-series Mac\n\n\u003e [!CAUTION]\n\u003e \n\u003e If you are on an M-series Mac and encounter an error attributed box2dpy during installation, you need to install SWIG using the instructions shown below.\n\n\nIt is recommended to use [homebrew](https://brew.sh/) to install [SWIG](https://formulae.brew.sh/formula/swig) to support [Gym](https://github.com/openai/gym).\n\n```sh\n# if needed install homebrew\n/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n\n# then, do\nbrew install swig\n\n# then attempt to pip install with the preferred method, such as\npip install \"sheeprl[atari,box2d,mujoco,dev,test] @ git+https://github.com/Eclectic-Sheep/sheeprl.git\"\n```\n\n#### Additional: MineRL and MineDojo\n\n\u003e [!NOTE]\n\u003e \n\u003e If you want to install the *minedojo* or *minerl* environment support, Java JDK 8 is required: you can install it by following the instructions at this [link](https://docs.minedojo.org/sections/getting_started/install.html#on-ubuntu-20-04).\n\n\u003e [!CAUTION]\n\u003e\n\u003e **MineRL** and **MineDojo** environments have **conflicting requirements**, so **DO NOT install them together** with the `pip install sheeprl[minerl,minedojo]` command, but instead **install them individually** with either the command `pip install sheeprl[minerl]` or `pip install sheeprl[minedojo]` before running an experiment with the MineRL or MineDojo environment, respectively. \n\n### Run an experiment with SheepRL\n\nNow you can use one of the already available algorithms, or create your own.\nFor example, to train a PPO agent on the CartPole environment with only vector-like observations, just run\n\n```bash\npython sheeprl.py exp=ppo env=gym env.id=CartPole-v1\n```\n\nif you have installed from a cloned repo, or\n\n```bash\nsheeprl exp=ppo env=gym env.id=CartPole-v1\n```\n\nif you have installed SheepRL from PyPi.\n\nSimilarly, you check all the available algorithms with\n\n```bash\npython sheeprl/available_agents.py\n```\n\nif you have installed from a cloned repo, or\n\n```bash\nsheeprl-agents\n```\nif you have installed SheepRL from PyPi.\n\nThat's all it takes to train an agent with SheepRL! 🎉\n\n\u003e Before you start using the SheepRL framework, it is **highly recommended** that you read the following instructional documents:\n\u003e \n\u003e 1. How to [run experiments](https://github.com/Eclectic-Sheep/sheeprl/blob/main/howto/run_experiments.md)\n\u003e 2. How to [modify the default configs](https://github.com/Eclectic-Sheep/sheeprl/blob/main/howto/configs.md)\n\u003e 3. How to [work with steps](https://github.com/Eclectic-Sheep/sheeprl/blob/main/howto/work_with_steps.md)\n\u003e 4. How to [select observations](https://github.com/Eclectic-Sheep/sheeprl/blob/main/howto/select_observations.md)\n\u003e\n\u003e Moreover, there are other useful documents in the [`howto` folder](https://github.com/Eclectic-Sheep/sheeprl/tree/main/howto), these documents contain some guidance on how to properly use the framework.\n\n### :chart_with_upwards_trend: Check your results\n\nOnce you trained an agent, a new folder called `logs` will be created, containing the logs of the training. You can visualize them with [TensorBoard](https://www.tensorflow.org/tensorboard):\n\n```bash\ntensorboard --logdir logs\n```\n\nhttps://github.com/Eclectic-Sheep/sheeprl/assets/7341604/46ad4acd-180d-449d-b46a-25b4a1f038d9\n\n### :nerd_face: More about running an algorithm\n\nWhat you run is the PPO algorithm with the default configuration. But you can also change the configuration by passing arguments to the script.\n\nFor example, in the default configuration, the number of parallel environments is 4. Let's try to change it to 8 by passing the `--num_envs` argument:\n\n```bash\nsheeprl exp=ppo env=gym env.id=CartPole-v1 env.num_envs=8\n```\n\nAll the available arguments, with their descriptions, are listed in the `sheeprl/config` directory. You can find more information about the hierarchy of configs [here](./howto/run_experiments.md).\n\n### Running with Lightning Fabric\n\nTo run the algorithm with Lightning Fabric, you need to specify the Fabric parameters through the CLI. For example, to run the PPO algorithm with 4 parallel environments on 2 nodes, you can run:\n\n```bash\nsheeprl fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2 exp=ppo env=gym env.id=CartPole-v1\n```\n\nYou can check the available parameters for Lightning Fabric [here](https://lightning.ai/docs/fabric/stable/api/fabric_args.html).\n\n### Evaluate your Agents\n\nYou can easily evaluate your trained agents from checkpoints: training configurations are retrieved automatically.\n\n```bash\nsheeprl-eval checkpoint_path=/path/to/checkpoint.ckpt fabric.accelerator=gpu env.capture_video=True\n```\n\nFor more information, check the corresponding [howto](./howto/eval_your_agent.md).\n\n## :book: Repository structure\n\nThe repository is structured as follows:\n\n- `algos`: contains the implementations of the algorithms. Each algorithm is in a separate folder, and (possibly) contains the following files:\n\n  - `\u003calgorithm\u003e.py`: contains the implementation of the algorithm.\n  - `\u003calgorithm\u003e_decoupled.py`: contains the implementation of the decoupled version of the algorithm, if present.\n  - `agent`: optional, contains the implementation of the agent.\n  - `loss.py`: contains the implementation of the loss functions of the algorithm.\n  - `utils.py`: contains utility functions for the algorithm.\n- `configs`: contains the default configs of the algorithms.\n- `data`: contains the implementation of the data buffers.\n- `envs`: contains the implementation of the environment wrappers.\n- `models`: contains the implementation of some standard models (building blocks), like the multi-layer perceptron (MLP) or a simple convolutional network (NatureCNN)\n- `utils`: contains utility functions for the framework.\n\n#### Coupled vs Decoupled\n\nIn the coupled version of an algorithm, the agent interacts with the environment and executes the training loop.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/images/sheeprl_coupled.png\"\u003e\n\u003c/p\u003e\n\nIn the decoupled version, a process is responsible only for interacting with the environment, and all the other processes are responsible for executing the training loop. The two processes communicate through [distributed collectives, adopting the abstraction provided by Fabric's TorchCollective](https://lightning.ai/docs/fabric/stable/api/generated/lightning.fabric.plugins.collectives.TorchCollective.html#lightning.fabric.plugins.collectives.TorchCollective).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/images/sheeprl_decoupled.png\"\u003e\n\u003c/p\u003e\n\n#### Coupled\n\nThe algorithm is implemented in the `\u003calgorithm\u003e.py` file.\n\nThere are 2 functions inside this script:\n\n- `main()`: initializes all the components of the algorithm, and executes the interactions with the environment. Once enough data is collected, the training loop is executed by calling the `train()` function.\n- `train()`: executes the training loop. It samples a batch of data from the buffer, computes the loss, and updates the parameters of the agent.\n\n#### Decoupled\n\nThe decoupled version of an algorithm is implemented in the `\u003calgorithm\u003e_decoupled.py` file.\n\nThere are 3 functions inside this script:\n\n- `main()`: initializes all the components of the algorithm, the collectives for the communication between the player and the trainers, and calls the `player()` and `trainer()` functions.\n- `player()`: executes the interactions with the environment. It samples an action from the policy network, executes it in the environment, and stores the transition in the buffer. After a predefined number of interactions with the environment, the player randomly splits the collected data into almost equal chunks and sends them separately to the trainers. It then waits for the trainers to finish the agent update.\n- `trainer()`: executes the training loop. It receives a chunk of data from the player, computes the loss, and updates the parameters of the agent. After the agent has been updated, the first of the trainers sends back the updated agent weights to the player, which can interact again with the environment.\n\n## Algorithms implementation\n\nYou can check inside the folder of each algorithm the `README.md` file for the details about the implementation.\n\nAll algorithms are kept as simple as possible, in a [CleanRL](https://github.com/vwxyzjn/cleanrl) fashion. But to allow for more flexibility and also more clarity, we tried to abstract away anything that is not strictly related to the training loop of the algorithm.\n\nFor example, we decided to create a `models` folder with already-made models that can be composed to create the model of the agent.\n\nFor each algorithm, losses are kept in a separate module, so that their implementation is clear and can be easily utilized for the decoupled or the recurrent version of the algorithm.\n\n## :card_index_dividers: Buffer\n\nFor the buffer implementation, we choose to use a wrapper around a dictionary of Numpy arrays.\n\nTo enable a simple way to work with numpy memory-mapped arrays, we implemented the `sheeprl.utils.memmap.MemmapArray`, a container that handles the memory-mapped arrays.\n\nThis flexibility makes it very simple to implement, with the classes `ReplayBuffer`, `SequentialReplayBuffer`, `EpisodeBuffer`, and `EnvIndependentReplayBuffer`, all the buffers needed for on-policy and off-policy algorithms.\n\n### :mag: Technical details\n\nThe shape of the Numpy arrays in the dictionary is `(T, B, *)`, where `T` is the number of timesteps, `B` is the number of parallel environments, and `*` is the shape of the data.\n\nFor the `ReplayBuffer` to be used as a RolloutBuffer, the proper `buffer_size` must be specified. For example, for PPO, the `buffer_size` must be `[T, B]`, where `T` is the number of timesteps and `B` is the number of parallel environments.\n\n## :bow: Contributing\n\nThe best way to contribute is by opening an issue to discuss a new feature or a bug, or by opening a PR to fix a bug or to add a new feature.\n\n## :mailbox_with_no_mail: Contacts\n\nYou can contact us for any further questions or discussions:\n\n- Federico Belotti: belo.fede@outlook.com\n- Davide Angioni: davide.angioni@orobix.com\n- Refik Can Malli: refikcan.malli@orobix.com\n- Michele Milesi: michele.milesi@orobix.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEclectic-Sheep%2Fsheeprl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEclectic-Sheep%2Fsheeprl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEclectic-Sheep%2Fsheeprl/lists"}