{"id":13678837,"url":"https://github.com/medipixel/rl_algorithms","last_synced_at":"2025-04-29T15:33:26.297Z","repository":{"id":45411518,"uuid":"161100560","full_name":"medipixel/rl_algorithms","owner":"medipixel","description":"Structural implementation of RL key algorithms","archived":false,"fork":false,"pushed_at":"2023-04-08T09:15:39.000Z","size":2727,"stargazers_count":503,"open_issues_count":15,"forks_count":63,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-08-02T13:24:50.309Z","etag":null,"topics":["deep-learning","dqn","gym","policy-gradient","python3","pytorch","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://www.medipixel.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/medipixel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-12-10T01:40:01.000Z","updated_at":"2024-08-01T13:39:54.000Z","dependencies_parsed_at":"2024-01-12T17:34:43.420Z","dependency_job_id":"6636b368-5fd0-4404-aee2-21c671a4c861","html_url":"https://github.com/medipixel/rl_algorithms","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medipixel%2Frl_algorithms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medipixel%2Frl_algorithms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medipixel%2Frl_algorithms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medipixel%2Frl_algorithms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/medipixel","download_url":"https://codeload.github.com/medipixel/rl_algorithms/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224179149,"owners_count":17269011,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","dqn","gym","policy-gradient","python3","pytorch","reinforcement-learning"],"created_at":"2024-08-02T13:00:58.892Z","updated_at":"2025-04-29T15:33:26.287Z","avatar_url":"https://github.com/medipixel.png","language":"Python","funding_links":[],"categories":["Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL)","Python"],"sub_categories":["RL/DRL Algorithm Implementations and Software Frameworks"],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://user-images.githubusercontent.com/17582508/52845370-4a930200-314a-11e9-9889-e00007043872.jpg\" align=\"center\"\u003e\n\n[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/medipixel/rl_algorithms.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/medipixel/rl_algorithms/context:python)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\u003c!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --\u003e\n[![All Contributors](https://img.shields.io/badge/all_contributors-10-orange.svg?style=flat-square)](#contributors-)\n\u003c!-- ALL-CONTRIBUTORS-BADGE:END --\u003e\n\n\u003c/p\u003e\n\n## Contents\n\n* [Welcome!](https://github.com/medipixel/rl_algorithms#welcome)\n* [Contributors](https://github.com/medipixel/rl_algorithms#contributors)\n* [Algorithms](https://github.com/medipixel/rl_algorithms#algorithms)\n* [Performance](https://github.com/medipixel/rl_algorithms#performance)\n* [Getting Started](https://github.com/medipixel/rl_algorithms#getting-started)\n* [Class Diagram](https://github.com/medipixel/rl_algorithms#class-diagram)\n* [References](https://github.com/medipixel/rl_algorithms#references)\n\n\n## Welcome!\nThis repository contains Reinforcement Learning algorithms which are being used for research activities at Medipixel. The source code will be frequently updated. \nWe are warmly welcoming external contributors! :)\n\n|\u003cimg src=\"https://user-images.githubusercontent.com/17582508/52840582-18c76e80-313d-11e9-9752-3d6138f39a15.gif\" width=\"260\" height=\"180\"/\u003e|\u003cimg src=\"https://media.giphy.com/media/ZxLNajigOcLyeUnOwg/giphy.gif\" width=\"160\" height=\"180\"/\u003e|\u003cimg src=\"https://media.giphy.com/media/1mikGEln2lArKMQ6Pt/giphy.gif\" width=\"260\" height=\"180\"/\u003e|\n|:---:|:---:|:---:|\n|BC agent on LunarLanderContinuous-v2|RainbowIQN agent on PongNoFrameskip-v4|SAC agent on Reacher-v2|\n\n## Contributors\n\nThanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):\n\u003c!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --\u003e\n\u003c!-- prettier-ignore-start --\u003e\n\u003c!-- markdownlint-disable --\u003e\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/Curt-Park\"\u003e\u003cimg src=\"https://avatars3.githubusercontent.com/u/14961526?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eJinwoo Park (Curt)\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=Curt-Park\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/MrSyee\"\u003e\u003cimg src=\"https://avatars3.githubusercontent.com/u/17582508?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eKyunghwan Kim\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=MrSyee\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/darthegg\"\u003e\u003cimg src=\"https://avatars3.githubusercontent.com/u/16010242?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003edarthegg\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=darthegg\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/mclearning2\"\u003e\u003cimg src=\"https://avatars3.githubusercontent.com/u/43226417?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eMincheol Kim\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=mclearning2\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/minseop4898\"\u003e\u003cimg src=\"https://avatars1.githubusercontent.com/u/34338299?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003e김민섭\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=minseop4898\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/jinPrelude\"\u003e\u003cimg src=\"https://avatars1.githubusercontent.com/u/16518993?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eLeejin Jung\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=jinPrelude\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/cyoon1729\"\u003e\u003cimg src=\"https://avatars2.githubusercontent.com/u/33583101?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eChris Yoon\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=cyoon1729\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://jiseonghan.github.io/\"\u003e\u003cimg src=\"https://avatars2.githubusercontent.com/u/48741026?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eJiseong Han\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=jiseongHAN\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/sehyun-hwang\"\u003e\u003cimg src=\"https://avatars3.githubusercontent.com/u/23437715?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eSehyun Hwang\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"#maintenance-sehyun-hwang\" title=\"Maintenance\"\u003e🚧\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/isk03276\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/23740495?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eeunjin\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/medipixel/rl_algorithms/commits?author=isk03276\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003c!-- markdownlint-restore --\u003e\n\u003c!-- prettier-ignore-end --\u003e\n\n\u003c!-- ALL-CONTRIBUTORS-LIST:END --\u003e\nThis project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification.\n\n## Algorithms\n\n0. [Advantage Actor-Critic (A2C)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/a2c)\n1. [Deep Deterministic Policy Gradient (DDPG)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/ddpg)\n2. [Proximal Policy Optimization Algorithms (PPO)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/ppo)\n3. [Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/td3)\n4. [Soft Actor Critic Algorithm (SAC)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/sac)\n5. [Behaviour Cloning (BC with DDPG, SAC)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/bc)\n6. [From Demonstrations (DDPGfD, SACfD, DQfD)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/fd)\n7. [Rainbow DQN](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/dqn)\n8. [Rainbow IQN (without DuelingNet)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/dqn) - DuelingNet [degrades performance](https://github.com/medipixel/rl_algorithms/pull/137)\n9. Rainbow IQN (with [ResNet](https://github.com/medipixel/rl_algorithms/blob/master/rl_algorithms/common/networks/backbones/resnet.py))\n10. [Recurrent Replay DQN (R2D1)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/recurrent)\n11. [Distributed Pioritized Experience Replay (Ape-X)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/common/apex)\n12. [Policy Distillation](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/distillation)\n13. [Generative Adversarial Imitation Learning (GAIL)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/gail)\n14. [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/acer)\n\n## Performance\n\nWe have tested each algorithm on some of the following environments.\n- [PongNoFrameskip-v4](https://github.com/medipixel/rl_algorithms/tree/master/configs/pong_no_frameskip_v4)\n- [LunarLanderContinuous-v2](https://github.com/medipixel/rl_algorithms/tree/master/configs/lunarlander_continuous_v2)\n- [LunarLander_v2](https://github.com/medipixel/rl_algorithms/tree/master/configs/lunarlander_v2)\n- [Reacher-v2](https://github.com/medipixel/rl_algorithms/tree/master/configs/reacher-v2)\n\n❗Please note that this won't be frequently updated.\n\n\n#### PongNoFrameskip-v4\n\n**RainbowIQN** learns the game incredibly fast! It accomplishes the perfect score (21) [within 100 episodes](https://app.wandb.ai/curt-park/dqn/runs/b2p9e9f7/logs)!\nThe idea of RainbowIQN is roughly suggested from [W. Dabney et al.](https://arxiv.org/pdf/1806.06923.pdf).\n\nSee [W\u0026B Log](https://app.wandb.ai/curt-park/dqn/reports?view=curt-park%2FPong%20%28DQN%20%2F%20C51%20%2F%20IQN%20%2F%20IQN%20-double%20q%29) for more details. (The performance is measured on the commit [4248057](https://github.com/medipixel/rl_algorithms/pull/158))\n\n![pong_dqn](https://user-images.githubusercontent.com/17582508/56282434-1e93fd00-614a-11e9-9c31-af32e119d5b6.png)\n\n**RainbowIQN with ResNet**'s performance and learning speed were similar to those of RainbowIQN. Also we confirmed that **R2D1 (w/ Dueling, PER)** converges well in the Pong enviornment, though not as fast as RainbowIQN (in terms of update step).\n\nAlthough we were only able to test **Ape-X DQN (w/ Dueling)** with 4 workers due to limitations to computing power, we observed a significant speed-up in carrying out update steps (with batch size 512). Ape-X DQN learns Pong game in about 2 hours, compared to 4 hours for serial Dueling DQN.\n\nSee [W\u0026B Log](https://app.wandb.ai/medipixel_rl/PongNoFrameskip-v4/reports/200626-integration-test--VmlldzoxNTE1NjE) for more details. (The performance is measured on the commit [9e897ad](https://github.com/medipixel/rl_algorithms/commit/9e897adfe93600c1db85ce1a7e064064b025c2c3))\n![pong dqn with resnet \u0026 rnn](https://user-images.githubusercontent.com/17582508/85813189-80fc7a80-b79d-11ea-96cf-947a62e380f3.png)\n\n![apex dqn](https://user-images.githubusercontent.com/17582508/85814263-83ac9f00-b7a0-11ea-9cdc-ff29de9a6d54.png)\n\n#### LunarLander-v2 / LunarLanderContinuous-v2\n\nWe used these environments just for a quick verification of each algorithm, so some of experiments may not show the best performance. \n\n##### 👇 Click the following lines to see the figures.\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eLunarLander-v2: RainbowDQN, RainbowDQfD, R2D1 \u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\nSee \u003ca href=\"https://app.wandb.ai/medipixel_rl/LunarLander-v2/reports/200626-integration-test--VmlldzoxNTE2MzA\"\u003eW\u0026B log\u003c/a\u003e for more details. (The performance is measured on the commit \u003ca href=\"https://github.com/medipixel/rl_algorithms/commit/9e897adfe93600c1db85ce1a7e064064b025c2c3\"\u003e9e897ad\u003c/a\u003e)\n\n![lunarlander-v2_dqn](https://user-images.githubusercontent.com/17582508/85815561-a5f3ec00-b7a3-11ea-8d7c-8d54953d0c07.png)\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eLunarLander-v2:ACER, RainbowDQN, R2D1\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\nSee \u003ca herf=\"https://wandb.ai/chaehyeuk-lee/LunarLander-v2/reports/LunarLander-v2-ACER--VmlldzoxMDU4OTQ1?accessToken=yxrr1h1t2d4n3j22hjz4ktzzgkpuhrm7txlyfpl3jb74les23vbfovvw5g64xgtg\"\u003eW\u0026B log\u003c/a\u003e for more details. (The performance is measured on the commit \u003ca href=\"https://github.com/medipixel/rl_algorithms/pull/298/commits/82fae77f55f94bb4bc3fb7fc9c44b54dc232c4ff\"\u003e82fae77\u003c/a\u003e)\n\n![lunarlander-v2_acer](https://user-images.githubusercontent.com/48741026/134847201-c7ce6d9f-e930-497f-9473-05da7620095b.png)\n\u003c/p\u003e\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eLunarLanderContinuous-v2: A2C, PPO, DDPG, TD3, SAC\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\nSee \u003ca href=\"https://app.wandb.ai/medipixel_rl/LunarLanderContinuous-v2/reports/200626-integration-test--VmlldzoxNDg1MjU\"\u003eW\u0026B log\u003c/a\u003e for more details. (The performance is measured on the commit \u003ca href=\"https://github.com/medipixel/rl_algorithms/commit/9e897adfe93600c1db85ce1a7e064064b025c2c3\"\u003e9e897ad\u003c/a\u003e)\n\n![lunarlandercontinuous-v2_baselines](https://user-images.githubusercontent.com/17582508/85818298-43065300-b7ab-11ea-9ee0-1eda855498ed.png)\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eLunarLanderContinuous-v2: DDPG, DDPGfD, BC-DDPG\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\nSee \u003ca href=\"https://app.wandb.ai/medipixel_rl/LunarLanderContinuous-v2/reports/200626-integration-test--VmlldzoxNDg1MjU\"\u003eW\u0026B log\u003c/a\u003e for more details. (The performance is measured on the commit \u003ca href=\"https://github.com/medipixel/rl_algorithms/commit/9e897adfe93600c1db85ce1a7e064064b025c2c3\"\u003e9e897ad\u003c/a\u003e)\n\n![lunarlandercontinuous-v2_ddpg](https://user-images.githubusercontent.com/17582508/85818519-c9bb3000-b7ab-11ea-9473-08476a959a0c.png)\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eLunarLanderContinuous-v2: SAC, SACfD, BC-SAC\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\nSee \u003ca href=\"https://app.wandb.ai/medipixel_rl/LunarLanderContinuous-v2/reports/200626-integration-test--VmlldzoxNDg1MjU\"\u003eW\u0026B log\u003c/a\u003e for more details. (The performance is measured on the commit \u003ca href=\"https://github.com/medipixel/rl_algorithms/commit/9e897adfe93600c1db85ce1a7e064064b025c2c3\"\u003e9e897ad\u003c/a\u003e)\n\n![lunarlandercontinuous-v2_sac](https://user-images.githubusercontent.com/17582508/85818654-1acb2400-b7ac-11ea-8641-d559839cab62.png)\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eLunarLanderContinuous-v2: PPO, SAC, GAIL\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\nSee \u003ca href=\"https://wandb.ai/chaehyeuk-lee/LunarLanderContinuous-v2?workspace=user-chaehyeuk-lee\"\u003eW\u0026B log\u003c/a\u003e for more details. (The performance is measured on the commit \u003ca href=\"https://github.com/medipixel/rl_algorithms/commit/922222b2e249f1f14bdf1a28c9f0f00752e49907\"\u003e9e897ad\u003c/a\u003e)\n\n![lunarlandercontinuous-v2_gail](https://user-images.githubusercontent.com/23740495/130401442-8b668975-8760-4a79-b757-1c1e9a9c4e47.png)\n\u003c/p\u003e\n\u003c/details\u003e\n\n#### Reacher-v2\n\nWe reproduced the performance of **DDPG**, **TD3**, and **SAC** on Reacher-v2 (Mujoco). They reach the score around -3.5 to -4.5.\n\n##### 👇 Click the following the line to see the figures.\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eReacher-v2: DDPG, TD3, SAC\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\u003cbr\u003e\n\nSee [W\u0026B Log](https://app.wandb.ai/medipixel_rl/reacher-v2/reports?view=curt-park%2FBaselines%20%23158) for more details.\n\n![reacher-v2_baselines](https://user-images.githubusercontent.com/17582508/56282421-163bc200-614a-11e9-8d4d-2bb520575fbb.png)\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\n## Getting started\n\n#### Prerequisites\n* This repository is tested on [Anaconda](https://www.anaconda.com/distribution/) virtual environment with python 3.6.1+\n    ```\n    $ conda create -n rl_algorithms python=3.7.9\n    $ conda activate rl_algorithms\n    ```\n* In order to run Mujoco environments (e.g. `Reacher-v2`), you need to acquire [Mujoco license](https://www.roboti.us/license.html).\n\n#### Installation\nFirst, clone the repository.\n```\ngit clone https://github.com/medipixel/rl_algorithms.git\ncd rl_algorithms\n```\n\n###### For users\nInstall packages required to execute the code. It includes `python setup.py install`. Just type:\n```\nmake dep\n```\n\n###### For developers\nIf you want to modify code you should configure formatting and linting settings. It automatically runs formatting and linting when you commit the code. Contrary to `make dep` command, it includes `python setup.py develop`. Just type:\n\n```\nmake dev\n```\n\nAfter having done `make dev`, you can validate the code by the following commands.\n```\nmake format  # for formatting\nmake test  # for linting\n```\n\n#### Usages\nYou can train or test `algorithm` on `env_name` if `configs/env_name/algorithm.yaml` exists. (`configs/env_name/algorithm.yaml` contains hyper-parameters)\n```\npython run_env_name.py --cfg-path \u003cconfig-path\u003e\n``` \n\ne.g. running soft actor-critic on LunarLanderContinuous-v2.\n```\npython run_lunarlander_continuous_v2.py --cfg-path ./configs/lunarlander_continuous_v2/sac.yaml \u003cother-options\u003e\n```\n\ne.g. running a custom agent, **if you have written your own configs**: `configs/env_name/ddpg-custom.yaml`.\n```\npython run_env_name.py --cfg-path ./configs/lunarlander_continuous_v2/ddpg-custom.py\n```\nYou will see the agent run with hyper parameter and model settings you configured.\n\n#### Arguments for run-files\n\nIn addition, there are various argument settings for running algorithms. If you check the options to run file you should command \n```\npython \u003crun-file\u003e -h\n```\n- `--test`\n    - Start test mode (no training).\n- `--off-render`\n    - Turn off rendering.\n- `--log`\n    - Turn on logging using [W\u0026B](https://www.wandb.com/).\n- `--seed \u003cint\u003e`\n    - Set random seed.\n- `--save-period \u003cint\u003e`\n    - Set saving period of model and optimizer parameters.\n- `--max-episode-steps \u003cint\u003e`\n    - Set maximum episode step number of the environment. If the number is less than or equal to 0, it uses the default maximum step number of the environment.\n- `--episode-num \u003cint\u003e`\n    - Set the number of episodes for training.\n- `--render-after \u003cint\u003e`\n    - Start rendering after the number of episodes.\n- `--load-from \u003csave-file-path\u003e`\n    - Load the saved models and optimizers at the beginning.\n\n#### Show feature map with Grad-CAM and Saliency-map\nYou can show a feature map that the trained agent extract using **[Grad-CAM(Gradient-weighted Class Activation Mapping)](https://arxiv.org/pdf/1610.02391.pdf)** and **[Saliency map](https://arxiv.org/pdf/1312.6034.pdf)**. \n\nGrad-CAM is a way of combining feature maps using the gradient signal, and produce a coarse localization map of the important regions in the image. You can use it by adding [Grad-CAM config](https://github.com/medipixel/rl_algorithms/blob/master/configs/pong_no_frameskip_v4/dqn.py#L39) and `--grad-cam` flag when you run. For example:\n```\npython run_env_name.py --cfg-path \u003cconfig-path\u003e --test --grad-cam\n```\nThe results will be rendered as follows:\n\n\u003cimg src=\"https://user-images.githubusercontent.com/17582508/79204132-02b75a00-7e77-11ea-9c78-ab543055bd4f.gif\" width=\"400\" height=\"400\" align=\"center\"/\u003e\n\nYou can also use Saliency-map in a similar way to Grad-CAM just by adding `--saliency-map` flag. Saliency-map need trained weight carried by `--load-from` flag. \n```\npython run_env_name.py --cfg-path \u003cconfig-path\u003e --load-from \u003csave-file-path\u003e --test --saliency-map\n```\nSaliency map will be stored in data/saliency_map\n\n\u003cimg src=\"https://user-images.githubusercontent.com/16518993/106556182-56a84200-6562-11eb-8c9c-a5b19629335c.gif\" width=\"400\" height=\"140\" align=\"center\"/\u003e\n\nBoth Grad-CAM and Saliency-map can be only used for the agent that uses convolutional layers like **DQN for Pong environment**. You can see feature maps of all the configured convolution layers.\n\n\n#### Using policy distillation\n\nWe seperate the document about using policy distillation in [rl_algorithms/distillation/README.md](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/distillation).\n\n\n#### W\u0026B for logging\nWe use [W\u0026B](https://www.wandb.com/) for logging of network parameters and others. For logging, please follow the steps below after requirement installation:\n\n\u003e0. Create a [wandb](https://www.wandb.com/) account\n\u003e1. Check your **API key** in settings, and login wandb on your terminal: `$ wandb login API_KEY`\n\u003e2. Initialize wandb: `$ wandb init`\n\nFor more details, read [W\u0026B tutorial](https://docs.wandb.com/docs/started.html).\n\n## Class Diagram\nClass diagram at [#135](https://github.com/medipixel/rl_algorithms/pull/135).\n\n❗This won't be frequently updated.\n\n![RL_Algorithms_ClassDiagram](https://user-images.githubusercontent.com/16010242/55934443-812d5a80-5c6b-11e9-9b31-fa8214965a55.png)\n\n## Citing the Project\nTo cite this repository in publications:\n```\n@misc{rl_algorithms,\n  author = {Kim, Kyunghwan and Lee, Chaehyuk and Jeong, Euijin and Han, Jiseong and Kim, Minseop and Yoon, Chris and Kim, Mincheol and Park, Jinwoo},\n  title = {Medipixel RL algorithms},\n  year = {2020},\n  publisher = {Github},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/medipixel/rl_algorithms}},\n}\n```\n## References\n0. [T. P. Lillicrap et al., \"Continuous control with deep reinforcement learning.\" arXiv preprint arXiv:1509.02971, 2015.](https://arxiv.org/pdf/1509.02971.pdf)\n1. [J. Schulman et al., \"Proximal Policy Optimization Algorithms.\" arXiv preprint arXiv:1707.06347, 2017.](https://arxiv.org/abs/1707.06347.pdf)\n2. [S. Fujimoto et al., \"Addressing function approximation error in actor-critic methods.\" arXiv preprint arXiv:1802.09477, 2018.](https://arxiv.org/pdf/1802.09477.pdf)\n3. [T.  Haarnoja et al., \"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.\" arXiv preprint arXiv:1801.01290, 2018.](https://arxiv.org/pdf/1801.01290.pdf)\n4. [T. Haarnoja et al., \"Soft Actor-Critic Algorithms and Applications.\" arXiv preprint arXiv:1812.05905, 2018.](https://arxiv.org/pdf/1812.05905.pdf)\n5. [T. Schaul et al., \"Prioritized Experience Replay.\" arXiv preprint arXiv:1511.05952, 2015.](https://arxiv.org/pdf/1511.05952.pdf)\n6. [M. Andrychowicz et al., \"Hindsight Experience Replay.\" arXiv preprint arXiv:1707.01495, 2017.](https://arxiv.org/pdf/1707.01495.pdf)\n7. [A. Nair et al., \"Overcoming Exploration in Reinforcement Learning with Demonstrations.\" arXiv preprint arXiv:1709.10089, 2017.](https://arxiv.org/pdf/1709.10089.pdf)\n8. [M. Vecerik et al., \"Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards.\"arXiv preprint arXiv:1707.08817, 2017](https://arxiv.org/pdf/1707.08817.pdf)\n9. [V. Mnih et al., \"Human-level control through deep reinforcement learning.\" Nature, 518\n(7540):529–533, 2015.](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)\n10. [van Hasselt et al., \"Deep Reinforcement Learning with Double Q-learning.\" arXiv preprint arXiv:1509.06461, 2015.](https://arxiv.org/pdf/1509.06461.pdf)\n11. [Z. Wang et al., \"Dueling Network Architectures for Deep Reinforcement Learning.\" arXiv preprint arXiv:1511.06581, 2015.](https://arxiv.org/pdf/1511.06581.pdf)\n12. [T. Hester et al., \"Deep Q-learning from Demonstrations.\" arXiv preprint arXiv:1704.03732, 2017.](https://arxiv.org/pdf/1704.03732.pdf)\n13. [M. G. Bellemare et al., \"A Distributional Perspective on Reinforcement Learning.\" arXiv preprint arXiv:1707.06887, 2017.](https://arxiv.org/pdf/1707.06887.pdf)\n14. [M. Fortunato et al., \"Noisy Networks for Exploration.\" arXiv preprint arXiv:1706.10295, 2017.](https://arxiv.org/pdf/1706.10295.pdf)\n15. [M. Hessel et al., \"Rainbow: Combining Improvements in Deep Reinforcement Learning.\" arXiv preprint arXiv:1710.02298, 2017.](https://arxiv.org/pdf/1710.02298.pdf)\n16. [W. Dabney et al., \"Implicit Quantile Networks for Distributional Reinforcement Learning.\" arXiv preprint arXiv:1806.06923, 2018.](https://arxiv.org/pdf/1806.06923.pdf)\n17. [Ramprasaath R. Selvaraju et al., \"Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.\" arXiv preprint arXiv:1610.02391, 2016.](https://arxiv.org/pdf/1610.02391.pdf)\n18. [Kaiming He et al., \"Deep Residual Learning for Image Recognition.\" arXiv preprint arXiv:1512.03385, 2015.](https://arxiv.org/pdf/1512.03385)\n19. [Steven Kapturowski et al., \"Recurrent Experience Replay in Distributed Reinforcement Learning.\" in International Conference on Learning Representations https://openreview.net/forum?id=r1lyTjAqYX, 2019.](https://openreview.net/forum?id=r1lyTjAqYX)\n20. [Horgan et al., \"Distributed Prioritized Experience Replay.\" in International Conference on Learning Representations, 2018](https://arxiv.org/pdf/1803.00933.pdf)\n21. [Simonyan et al., \"Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps\", 2013](https://arxiv.org/pdf/1312.6034.pdf)\n22. [Ho et al., \"Generative adversarial imitation learning\", 2016](https://arxiv.org/abs/1606.03476)\n23. [Wang, Ziyu, et al. \"Sample efficient actor-critic with experience replay\", 2016.](https://arxiv.org/pdf/1611.01224.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedipixel%2Frl_algorithms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmedipixel%2Frl_algorithms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedipixel%2Frl_algorithms/lists"}