{"id":16403874,"url":"https://github.com/sdpkjc/abcdrl","last_synced_at":"2025-08-08T18:42:11.456Z","repository":{"id":64792432,"uuid":"565051902","full_name":"sdpkjc/abcdrl","owner":"sdpkjc","description":"Modular Single-file Reinfocement Learning Algorithms Library","archived":false,"fork":false,"pushed_at":"2023-05-16T09:16:32.000Z","size":8302,"stargazers_count":37,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-11T05:50:59.922Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning","machine-learning","python","pytorch","reinfocement-learning"],"latest_commit_sha":null,"homepage":"http://docs.abcdrl.xyz","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sdpkjc.png","metadata":{"files":{"readme":"README.cn.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-12T07:18:55.000Z","updated_at":"2024-09-17T10:55:29.000Z","dependencies_parsed_at":"2024-10-11T05:51:07.552Z","dependency_job_id":"c5522979-17ae-45e6-aec1-3f4f3a2d1219","html_url":"https://github.com/sdpkjc/abcdrl","commit_stats":{"total_commits":94,"total_committers":3,"mean_commits":"31.333333333333332","dds":"0.15957446808510634","last_synced_commit":"5747f2148fa4c522bd949af646437c3394435c98"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpkjc%2Fabcdrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpkjc%2Fabcdrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpkjc%2Fabcdrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdpkjc%2Fabcdrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sdpkjc","download_url":"https://codeload.github.com/sdpkjc/abcdrl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221666089,"owners_count":16860366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning","machine-learning","python","pytorch","reinfocement-learning"],"created_at":"2024-10-11T05:50:38.311Z","updated_at":"2024-10-27T10:56:52.570Z","avatar_url":"https://github.com/sdpkjc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **abcdRL** (简单四步实现一个强化学习算法)\n\n[English](./README.md) | 简体中文\n\n[![license](https://img.shields.io/pypi/l/abcdrl)](https://github.com/sdpkjc/abcdrl)\n[![pytest](https://github.com/sdpkjc/abcdrl/actions/workflows/test.yml/badge.svg)](https://github.com/sdpkjc/abcdrl/actions/workflows/test.yml)\n[![pre-commit](https://github.com/sdpkjc/abcdrl/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/sdpkjc/abcdrl/actions/workflows/pre-commit.yml)\n[![pypi](https://img.shields.io/pypi/v/abcdrl)](https://pypi.org/project/abcdrl)\n[![docker autobuild](https://img.shields.io/docker/cloud/build/sdpkjc/abcdrl)](https://hub.docker.com/r/sdpkjc/abcdrl/)\n[![docs](https://img.shields.io/github/deployments/sdpkjc/abcdrl/Production?label=docs\u0026logo=vercel)](https://docs.abcdrl.xyz/)\n[![Gitpod ready-to-code](https://img.shields.io/badge/Gitpod-ready--to--code-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/sdpkjc/abcdrl)\n[![benchmark](https://img.shields.io/badge/Weights%20\u0026%20Biases-benchmark-FFBE00?logo=weightsandbiases)](https://report.abcdrl.xyz/)\n[![mirror repo](https://img.shields.io/badge/Gitee-mirror%20repo-black?style=flat\u0026labelColor=C71D23\u0026logo=gitee)](https://gitee.com/sdpkjc/abcdrl/)\n[![Checked with mypy](https://img.shields.io/badge/mypy-checked-blue)](http://mypy-lang.org/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![python versions](https://img.shields.io/pypi/pyversions/abcdrl)](https://pypi.org/project/abcdrl)\n\nabcdRL 是一个**模块化单文件强化学习代码库**，提供“有但不严格”的模块化设计，和清晰的单文件算法实现。\n\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://docs.abcdrl.xyz/imgs/adam.svg\"\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://docs.abcdrl.xyz/imgs/adam_dark.svg\"\u003e\n  \u003cimg alt=\"adam\" src=\"https://docs.abcdrl.xyz/imgs/adam.svg\" width=\"300\"\u003e\n\u003c/picture\u003e\n\n*阅读代码时，在单文件代码中，快速了解算法的完整实现细节；改进算法时，得益于轻量的模块化设计，只需专注于少量的模块。*\n\n\u003e abcdRL 主要参考了 [vwxyzjn/cleanrl](https://github.com/vwxyzjn/cleanrl/) 的单文件设计哲学和 [PaddlePaddle/PARL](https://github.com/PaddlePaddle/PARL/) 的模块设计。\n\n***使用文档 ➡️ [docs.abcdrl.xyz](https://docs.abcdrl.xyz/zh/)***\n\n***路线图🗺️ [#57](https://github.com/sdpkjc/abcdrl/issues/57)***\n\n## 🚀 快速开始\n\n在 Gitpod🌐 中打开项目，并立即开始编码。\n\n[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/sdpkjc/abcdrl)\n\n使用 Docker📦：\n\n```shell\n# 0. 安装 Docker \u0026 Nvidia Drive \u0026 NVIDIA Container Toolkit\n# 1. 运行 DQN 算法\ndocker run --rm --gpus all sdpkjc/abcdrl python abcdrl/dqn_torch.py\n```\n\n***[详细安装说明 👀](https://docs.abcdrl.xyz/zh/install/)***\n\n## 🐼 特点\n\n- 👨‍👩‍👧‍👦 统一的代码结构\n- 📄 单文件实现\n- 🐷 低代码复用\n- 📐 最小化代码差异\n- 📈 集成 Tensorboard \u0026 Wandb\n- 🛤 符合 PEP8 \u0026 PEP526 规范\n\n## 🗽 设计哲学\n\n- 要“拷贝📋”，~~不要“继承🧬”~~\n- 要“单文件📜”，~~不要“多文件📚”~~\n- 要“功能复用🛠”，~~不要“算法复用🖨”~~\n- 要“一致的逻辑🤖”，~~不要“一致的接口🔌”~~\n\n## ✅ 已实现算法\n\n***Weights \u0026 Biases 性能报告 ➡️ [report.abcdrl.xyz](https://report.abcdrl.xyz)***\n\n- [Deep Q Network (DQN)](https://doi.org/10.1038/nature14236) \u003csub\u003e`dqn_torch.py`, `dqn_tf.py`, `dqn_atari_torch.py`, `dqn_atari_tf.py`\u003c/sub\u003e\n- [Deep Deterministic Policy Gradient (DDPG)](http://arxiv.org/abs/1509.02971) \u003csub\u003e`ddpg_torch.py`\u003c/sub\u003e\n- [Twin Delayed Deep Deterministic Policy Gradient (TD3)](http://arxiv.org/abs/1802.09477) \u003csub\u003e`td3_torch.py`\u003c/sub\u003e\n- [Soft Actor-Critic (SAC)](http://arxiv.org/abs/1801.01290) \u003csub\u003e`sac_torch.py`\u003c/sub\u003e\n- [Proximal Policy Optimization (PPO)](http://arxiv.org/abs/1802.09477) \u003csub\u003e`ppo_torch.py`\u003c/sub\u003e\n\n---\n\n- [Double Deep Q Network (DDQN)](http://arxiv.org/abs/1509.06461) \u003csub\u003e`ddqn_torch.py`, `ddqn_tf.py`\u003c/sub\u003e\n- [Prioritized Deep Q Network (PDQN)](http://arxiv.org/abs/1511.05952) \u003csub\u003e`pdqn_torch.py`, `pdqn_tf.py`\u003c/sub\u003e\n\n## 引用 abcdRL\n\n```bibtex\n@misc{zhao_abcdrl_2022,\n    author = {Yanxiao, Zhao},\n    month = {12},\n    title = {{abcdRL: Modular Single-file Reinforcement Learning Algorithms Library}},\n    url = {https://github.com/sdpkjc/abcdrl},\n    year = {2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdpkjc%2Fabcdrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsdpkjc%2Fabcdrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdpkjc%2Fabcdrl/lists"}