{"id":19287275,"url":"https://github.com/opendilab/di-engine","last_synced_at":"2025-05-12T15:28:58.760Z","repository":{"id":36980062,"uuid":"382787545","full_name":"opendilab/DI-engine","owner":"opendilab","description":"OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.","archived":false,"fork":false,"pushed_at":"2025-04-17T07:26:34.000Z","size":306391,"stargazers_count":3379,"open_issues_count":24,"forks_count":398,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-04-23T17:44:07.168Z","etag":null,"topics":["atari","distributed-reinforcement-learning","distributed-system","drl","exploration-exploitation","imitation-learning","impala","inverse-reinforcement-learning","minigrid","model-based-reinforcement-learning","mujoco","multiagent-reinforcement-learning","offline-rl","python","pytorch-rl","r2d2","reinforcement-learning","reinforcement-learning-algorithms","self-play","smac"],"latest_commit_sha":null,"homepage":"https://di-engine-docs.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opendilab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-07-04T07:11:05.000Z","updated_at":"2025-04-23T16:52:08.000Z","dependencies_parsed_at":"2023-09-22T13:26:59.775Z","dependency_job_id":"85ff0142-e776-42b0-8586-bfb13c174344","html_url":"https://github.com/opendilab/DI-engine","commit_stats":{"total_commits":758,"total_committers":55,"mean_commits":"13.781818181818181","dds":0.575197889182058,"last_synced_commit":"74c6a1e9230c19074dcd619ea4cafdcf2afb37ce"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FDI-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FDI-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FDI-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opendilab%2FDI-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opendilab","download_url":"https://codeload.github.com/opendilab/DI-engine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253765390,"owners_count":21960723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atari","distributed-reinforcement-learning","distributed-system","drl","exploration-exploitation","imitation-learning","impala","inverse-reinforcement-learning","minigrid","model-based-reinforcement-learning","mujoco","multiagent-reinforcement-learning","offline-rl","python","pytorch-rl","r2d2","reinforcement-learning","reinforcement-learning-algorithms","self-play","smac"],"created_at":"2024-11-09T22:05:41.907Z","updated_at":"2025-05-12T15:28:58.711Z","avatar_url":"https://github.com/opendilab.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://di-engine-docs.readthedocs.io/en/latest/\"\u003e\u003cimg width=\"1000px\" height=\"auto\" src=\"https://github.com/opendilab/DI-engine-docs/blob/main/source/images/head_image.png\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n---\n\n[![Twitter](https://img.shields.io/twitter/url?style=social\u0026url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https://twitter.com/opendilab)\n[![PyPI](https://img.shields.io/pypi/v/DI-engine)](https://pypi.org/project/DI-engine/)\n![Conda](https://anaconda.org/opendilab/di-engine/badges/version.svg)\n![Conda update](https://anaconda.org/opendilab/di-engine/badges/latest_release_date.svg)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/DI-engine)\n![PyTorch Version](https://img.shields.io/badge/dynamic/json?color=blue\u0026label=pytorch\u0026query=%24.pytorchVersion\u0026url=https%3A%2F%2Fgist.githubusercontent.com/PaParaZz1/54c5c44eeb94734e276b2ed5770eba8d/raw/85b94a54933a9369f8843cc2cea3546152a75661/badges.json)\n\n![Loc](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/3690cccd811e4c5f771075c2f785c7bb/raw/loc.json)\n![Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/3690cccd811e4c5f771075c2f785c7bb/raw/comments.json)\n\n![Style](https://github.com/opendilab/DI-engine/actions/workflows/style.yml/badge.svg)\n[![Read en Docs](https://github.com/opendilab/DI-engine/actions/workflows/doc.yml/badge.svg)](https://di-engine-docs.readthedocs.io/en/latest)\n[![Read zh_CN Docs](https://img.shields.io/readthedocs/di-engine-docs?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://di-engine-docs.readthedocs.io/zh_CN/latest)\n![Unittest](https://github.com/opendilab/DI-engine/actions/workflows/unit_test.yml/badge.svg)\n![Algotest](https://github.com/opendilab/DI-engine/actions/workflows/algo_test.yml/badge.svg)\n![deploy](https://github.com/opendilab/DI-engine/actions/workflows/deploy.yml/badge.svg)\n[![codecov](https://codecov.io/gh/opendilab/DI-engine/branch/main/graph/badge.svg?token=B0Q15JI301)](https://codecov.io/gh/opendilab/DI-engine)\n\n![GitHub Org's stars](https://img.shields.io/github/stars/opendilab)\n[![GitHub stars](https://img.shields.io/github/stars/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/network)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/opendilab/DI-engine)\n[![GitHub issues](https://img.shields.io/github/issues/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/issues)\n[![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/pulls)\n[![Contributors](https://img.shields.io/github/contributors/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/graphs/contributors)\n[![GitHub license](https://img.shields.io/github/license/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/blob/master/LICENSE)\n[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](https://huggingface.co/OpenDILabCommunity)\n[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models?search=opendilab)\n[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)\n[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack\u0026amp)](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://hellogithub.com/repository/175c1e13739c4e429d0abf2b32ec583d\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://api.hellogithub.com/v1/widgets/recommend.svg?rid=175c1e13739c4e429d0abf2b32ec583d\u0026claim_uid=cExIpHuMKdTQ6BW\" alt=\"Featured｜HelloGitHub\" style=\"width: 250px; height: 54px;\" width=\"250\" height=\"54\" /\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\u003cbr\u003e\n\nUpdated on 2024.12.23 DI-engine-v0.5.3\n\n## Introduction to DI-engine\n\n[Documentation](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/) | [Tutorials](https://di-engine-docs.readthedocs.io/en/latest/01_quickstart/index.html) | [Feature](#feature) | [Task \u0026 Middleware](https://di-engine-docs.readthedocs.io/en/latest/03_system/index.html) | [TreeTensor](#general-data-container-treetensor) | [Roadmap](https://github.com/opendilab/DI-engine/issues/548)\n\n**DI-engine** is a generalized decision intelligence engine for PyTorch and JAX.\n\nIt provides **python-first** and **asynchronous-native** task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports **various [deep reinforcement learning](https://di-engine-docs.readthedocs.io/en/latest/10_concepts/index.html) algorithms** with superior performance, high efficiency, well-organized [documentation](https://di-engine-docs.readthedocs.io/en/latest/) and [unittest](https://github.com/opendilab/DI-engine/actions):\n\n- Most basic DRL algorithms: such as DQN, Rainbow, PPO, TD3, SAC, R2D2, IMPALA\n- Multi-agent RL algorithms: such as QMIX, WQMIX, MAPPO, HAPPO, ACE\n- Imitation learning algorithms (BC/IRL/GAIL): such as GAIL, SQIL, Guided Cost Learning, Implicit BC\n- Offline RL algorithms: BCQ, CQL, TD3BC, Decision Transformer, EDAC, Diffuser, Decision Diffuser, SO2\n- Model-based RL algorithms: SVG, STEVE, MBPO, DDPPO, DreamerV3\n- Exploration algorithms: HER, RND, ICM, NGU\n- LLM + RL Algorithms: PPO-max, DPO, PromptPG, PromptAWR\n- Other algorithms: such as PER, PLR, PCGrad\n- MCTS + RL algorithms: AlphaZero, MuZero, please refer to [LightZero](https://github.com/opendilab/LightZero)\n- Generative Model + RL algorithms: Diffusion-QL, QGPO, SRPO, please refer to [GenerativeRL](https://github.com/opendilab/GenerativeRL)\n\n\n**DI-engine** aims to **standardize different Decision Intelligence environments and applications**, supporting both academic research and prototype applications. Various training pipelines and customized decision AI applications are also supported:\n\n\u003cdetails open\u003e\n\u003csummary\u003e(Click to Collapse)\u003c/summary\u003e\n\n- Traditional academic environments\n  - [DI-zoo](https://github.com/opendilab/DI-engine#environment-versatility): various decision intelligence demonstrations and benchmark environments with DI-engine.\n- Tutorial courses\n  - [PPOxFamily](https://github.com/opendilab/PPOxFamily): PPO x Family DRL Tutorial Course\n- Real world decision AI applications\n  - [DI-star](https://github.com/opendilab/DI-star): Decision AI in StarCraftII\n  - [PsyDI](https://github.com/opendilab/PsyDI): Towards a Multi-Modal and Interactive Chatbot for Psychological Assessments\n  - [DI-drive](https://github.com/opendilab/DI-drive): Auto-driving platform\n  - [DI-sheep](https://github.com/opendilab/DI-sheep): Decision AI in 3 Tiles Game\n  - [DI-smartcross](https://github.com/opendilab/DI-smartcross): Decision AI in Traffic Light Control\n  - [DI-bioseq](https://github.com/opendilab/DI-bioseq): Decision AI in Biological Sequence Prediction and Searching\n  - [DI-1024](https://github.com/opendilab/DI-1024): Deep Reinforcement Learning + 1024 Game\n- Research paper\n  - [InterFuser](https://github.com/opendilab/InterFuser): [CoRL 2022] Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer\n  - [ACE](https://github.com/opendilab/ACE): [AAAI 2023] ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency\n  - [GoBigger](https://github.com/opendilab/GoBigger): [ICLR 2023] Multi-Agent Decision Intelligence Environment\n  - [DOS](https://github.com/opendilab/DOS): [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning\n  - [LightZero](https://github.com/opendilab/LightZero): [NeurIPS 2023 Spotlight] A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit\n  - [SO2](https://github.com/opendilab/SO2): [AAAI 2024] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning\n  - [LMDrive](https://github.com/opendilab/LMDrive): [CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models\n  - [SmartRefine](https://github.com/opendilab/SmartRefine): [CVPR 2024] SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction\n  - [ReZero](https://github.com/opendilab/LightZero): Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze\n  - [UniZero](https://github.com/opendilab/LightZero): Generalized and Efficient Planning with Scalable Latent World Models\n- Docs and Tutorials\n  - [DI-engine-docs](https://github.com/opendilab/DI-engine-docs): Tutorials, best practice and the API reference.\n  - [awesome-model-based-RL](https://github.com/opendilab/awesome-model-based-RL): A curated list of awesome Model-Based RL resources\n  - [awesome-exploration-RL](https://github.com/opendilab/awesome-exploration-rl): A curated list of awesome exploration RL resources\n  - [awesome-decision-transformer](https://github.com/opendilab/awesome-decision-transformer): A curated list of Decision Transformer resources\n  - [awesome-RLHF](https://github.com/opendilab/awesome-RLHF): A curated list of reinforcement learning with human feedback resources\n  - [awesome-multi-modal-reinforcement-learning](https://github.com/opendilab/awesome-multi-modal-reinforcement-learning): A curated list of Multi-Modal Reinforcement Learning resources\n  - [awesome-diffusion-model-in-rl](https://github.com/opendilab/awesome-diffusion-model-in-rl): A curated list of Diffusion Model in RL resources\n  - [awesome-ui-agents](https://github.com/opendilab/awesome-ui-agents): A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond\n  - [awesome-AI-based-protein-design](https://github.com/opendilab/awesome-AI-based-protein-design): a collection of research papers for AI-based protein design\n  - [awesome-end-to-end-autonomous-driving](https://github.com/opendilab/awesome-end-to-end-autonomous-driving): A curated list of awesome End-to-End Autonomous Driving resources\n  - [awesome-driving-behavior-prediction](https://github.com/opendilab/awesome-driving-behavior-prediction): A collection of research papers for Driving Behavior Prediction\n\n  \u003c/details\u003e\n\nOn the low-level end, DI-engine comes with a set of highly re-usable modules, including [RL optimization functions](https://github.com/opendilab/DI-engine/tree/main/ding/rl_utils), [PyTorch utilities](https://github.com/opendilab/DI-engine/tree/main/ding/torch_utils) and [auxiliary tools](https://github.com/opendilab/DI-engine/tree/main/ding/utils).\n\nBTW, **DI-engine** also has some special **system optimization and design** for efficient and robust large-scale RL training:\n\n\u003cdetails close\u003e\n\u003csummary\u003e(Click for Details)\u003c/summary\u003e\n\n- [treevalue](https://github.com/opendilab/treevalue): Tree-nested data structure\n- [DI-treetensor](https://github.com/opendilab/DI-treetensor): Tree-nested PyTorch tensor Lib\n- [DI-toolkit](https://github.com/opendilab/DI-toolkit): A simple toolkit package for decision intelligence\n- [DI-orchestrator](https://github.com/opendilab/DI-orchestrator): RL Kubernetes Custom Resource and Operator Lib\n- [DI-hpc](https://github.com/opendilab/DI-hpc): RL HPC OP Lib\n- [DI-store](https://github.com/opendilab/DI-store): RL Object Store\n\n\u003c/details\u003e\n\nHave fun with exploration and exploitation.\n\n## Outline\n\n- [Introduction to DI-engine](#introduction-to-di-engine)\n- [Outline](#outline)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Feature](#feature)\n  - [Algorithm Versatility](#algorithm-versatility)\n  - [Environment Versatility](#environment-versatility)\n  - [General Data Container: TreeTensor](#general-data-container-treetensor)\n- [Feedback and Contribution](#feedback-and-contribution)\n- [Supporters](#supporters)\n  - [↳ Stargazers](#-stargazers)\n  - [↳ Forkers](#-forkers)\n- [Citation](#citation)\n- [License](#license)\n\n## Installation\n\nYou can simply install DI-engine from PyPI with the following command:\n\n```bash\npip install DI-engine\n```\n\nFor more information about installation, you can refer to [installation](https://di-engine-docs.readthedocs.io/en/latest/01_quickstart/installation.html).\n\nAnd our dockerhub repo can be found [here](https://hub.docker.com/repository/docker/opendilab/ding)，we prepare `base image` and `env image` with common RL environments.\n\n\u003cdetails close\u003e\n\u003csummary\u003e(Click for Details)\u003c/summary\u003e\n\n- base: opendilab/ding:nightly\n- rpc: opendilab/ding:nightly-rpc\n- atari: opendilab/ding:nightly-atari\n- mujoco: opendilab/ding:nightly-mujoco\n- dmc: opendilab/ding:nightly-dmc2gym\n- metaworld: opendilab/ding:nightly-metaworld\n- smac: opendilab/ding:nightly-smac\n- grf: opendilab/ding:nightly-grf\n- cityflow: opendilab/ding:nightly-cityflow\n- evogym: opendilab/ding:nightly-evogym\n- d4rl: opendilab/ding:nightly-d4rl\n\n\u003c/details\u003e\n\nThe detailed documentation are hosted on [doc](https://di-engine-docs.readthedocs.io/en/latest/) | [中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/).\n\n## Quick Start\n\n[3 Minutes Kickoff](https://di-engine-docs.readthedocs.io/en/latest/01_quickstart/first_rl_program.html)\n\n[3 Minutes Kickoff (colab)](https://colab.research.google.com/drive/1_7L-QFDfeCvMvLJzRyBRUW5_Q6ESXcZ4)\n\n[DI-engine Huggingface Kickoff (colab)](https://colab.research.google.com/drive/1UH1GQOjcHrmNSaW77hnLGxFJrLSLwCOk)\n\n[How to migrate a new **RL Env**](https://di-engine-docs.readthedocs.io/en/latest/11_dizoo/index.html) | [如何迁移一个新的**强化学习环境**](https://di-engine-docs.readthedocs.io/zh_CN/latest/11_dizoo/index_zh.html)\n\n[How to customize the neural network model](https://di-engine-docs.readthedocs.io/en/latest/04_best_practice/custom_model.html) | [如何定制策略使用的**神经网络模型**](https://di-engine-docs.readthedocs.io/zh_CN/latest/04_best_practice/custom_model_zh.html)\n\n[测试/部署 **强化学习策略** 的样例](https://github.com/opendilab/DI-engine/blob/main/dizoo/classic_control/cartpole/entry/cartpole_c51_deploy.py)\n\n[新老 pipeline 的异同对比](https://di-engine-docs.readthedocs.io/zh_CN/latest/04_best_practice/diff_in_new_pipeline_zh.html)\n\n## Feature\n\n### Algorithm Versatility\n\n\u003cdetails open\u003e\n\u003csummary\u003e(Click to Collapse)\u003c/summary\u003e\n\n![discrete](https://img.shields.io/badge/-discrete-brightgreen) \u0026nbsp;discrete means discrete action space, which is only label in normal DRL algorithms (1-23)\n\n![continuous](https://img.shields.io/badge/-continous-green) \u0026nbsp;means continuous action space, which is only label in normal DRL algorithms (1-23)\n\n![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) \u0026nbsp;means hybrid (discrete + continuous) action space (1-23)\n\n![dist](https://img.shields.io/badge/-distributed-blue) \u0026nbsp;[Distributed Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/distributed_rl.html)｜[分布式强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/distributed_rl_zh.html)\n\n![MARL](https://img.shields.io/badge/-MARL-yellow) \u0026nbsp;[Multi-Agent Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/multi_agent_cooperation_rl.html)｜[多智能体强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/multi_agent_cooperation_rl_zh.html)\n\n![exp](https://img.shields.io/badge/-exploration-orange) \u0026nbsp;[Exploration Mechanisms in Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/exploration_rl.html)｜[强化学习中的探索机制](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/exploration_rl_zh.html)\n\n![IL](https://img.shields.io/badge/-IL-purple) \u0026nbsp;[Imitation Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/imitation_learning.html)｜[模仿学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/imitation_learning_zh.html)\n\n![offline](https://img.shields.io/badge/-offlineRL-darkblue) \u0026nbsp;[Offiline Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/offline_rl.html)｜[离线强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/offline_rl_zh.html)\n\n![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue) \u0026nbsp;[Model-Based Reinforcement Learning](https://di-engine-docs.readthedocs.io/en/latest/02_algo/model_based_rl.html)｜[基于模型的强化学习](https://di-engine-docs.readthedocs.io/zh_CN/latest/02_algo/model_based_rl_zh.html)\n\n![other](https://img.shields.io/badge/-other-lightgrey) \u0026nbsp;means other sub-direction algorithms, usually as plugin-in in the whole pipeline\n\nP.S: The `.py` file in `Runnable Demo` can be found in `dizoo`\n\n\n| No. |                                                              Algorithm                                                              |                                                                                     Label                                                                                     |                                                                                                                                   Doc and Implementation                                                                                                                                   |                                      Runnable Demo                                      |\n| :-: | :---------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------: |\n|  1  |                             [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |             [DQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqn.html)\u003cbr\u003e[DQN中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/dqn_zh.html)\u003cbr\u003e[policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py)             |     python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0     |\n|  2  |                                             [C51](https://arxiv.org/pdf/1707.06887.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [C51 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/c51.html)\u003cbr\u003e[policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py)                                                            |                      ding -m serial -c cartpole_c51_config.py -s 0                      |\n|  3  |                                            [QRDQN](https://arxiv.org/pdf/1710.10044.pdf)                                            |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                        [QRDQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qrdqn.html)\u003cbr\u003e[policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py)                                                        |                     ding -m serial -c cartpole_qrdqn_config.py -s 0                     |\n|  4  |                                             [IQN](https://arxiv.org/pdf/1806.06923.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [IQN doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/iqn.html)\u003cbr\u003e[policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py)                                                            |                      ding -m serial -c cartpole_iqn_config.py -s 0                      |\n|  5  |                                             [FQF](https://arxiv.org/pdf/1911.02140.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [FQF doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/fqf.html)\u003cbr\u003e[policy/fqf](https://github.com/opendilab/DI-engine/blob/main/ding/policy/fqf.py)                                                            |                      ding -m serial -c cartpole_fqf_config.py -s 0                      |\n|  6  |                                           [Rainbow](https://arxiv.org/pdf/1710.02298.pdf)                                           |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                    [Rainbow doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rainbow.html)\u003cbr\u003e[policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py)                                                    |                    ding -m serial -c cartpole_rainbow_config.py -s 0                    |\n|  7  |                                             [SQL](https://arxiv.org/pdf/1702.08165.pdf)                                             |                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)                          |                                                            [SQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sql.html)\u003cbr\u003e[policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py)                                                            |                      ding -m serial -c cartpole_sql_config.py -s 0                      |\n|  8  |                                         [R2D2](https://openreview.net/forum?id=r1lyTjAqYX)                                         |                            ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen)                            |                                                          [R2D2 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d2.html)\u003cbr\u003e[policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py)                                                          |                      ding -m serial -c cartpole_r2d2_config.py -s 0                      |\n|  9  |                   [PG](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf)                   |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                             [PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)\u003cbr\u003e[policy/pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pg.py)                                                             |                       ding -m serial -c cartpole_pg_config.py -s 0                       |\n| 10 |                                            [PromptPG](https://arxiv.org/abs/2209.14610)                                            |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                                                               [policy/prompt_pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_pg.py)                                                                                               |                   ding -m serial_onpolicy -c tabmwp_pg_config.py -s 0                   |\n| 11 |                                             [A2C](https://arxiv.org/pdf/1602.01783.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [A2C doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/a2c.html)\u003cbr\u003e[policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py)                                                            |                      ding -m serial -c cartpole_a2c_config.py -s 0                      |\n| 12 |                        [PPO](https://arxiv.org/abs/1707.06347)/[MAPPO](https://arxiv.org/pdf/2103.01955.pdf)                        | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)![MARL](https://img.shields.io/badge/-MARL-yellow) |                                                            [PPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppo.html)\u003cbr\u003e[policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py)                                                            | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 |\n| 13 |                                             [PPG](https://arxiv.org/pdf/2009.04416.pdf)                                             |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                            [PPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ppg.html)\u003cbr\u003e[policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py)                                                            |                             python3 -u cartpole_ppg_main.py                             |\n| 14 |                                            [ACER](https://arxiv.org/pdf/1611.01224.pdf)                                            |                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)                          |                                                          [ACER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/acer.html)\u003cbr\u003e[policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py)                                                          |                      ding -m serial -c cartpole_acer_config.py -s 0                      |\n| 15 |                                             [IMPALA](https://arxiv.org/abs/1802.01561)                                             |                            ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen)                            |                                                      [IMPALA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/impala.html)\u003cbr\u003e[policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py)                                                      |                     ding -m serial -c cartpole_impala_config.py -s 0                     |\n| 16 |                     [DDPG](https://arxiv.org/pdf/1509.02971.pdf)/[PADDPG](https://arxiv.org/pdf/1511.04143.pdf)                     |                             ![continuous](https://img.shields.io/badge/-continous-green)![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                             |                                                          [DDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)\u003cbr\u003e[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py)                                                          |                      ding -m serial -c pendulum_ddpg_config.py -s 0                      |\n| 17 |                                             [TD3](https://arxiv.org/pdf/1802.09477.pdf)                                             |                             ![continuous](https://img.shields.io/badge/-continous-green)![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                             |                                                            [TD3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)\u003cbr\u003e[policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py)                                                            |     python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0     |\n| 18 |                                            [D4PG](https://arxiv.org/pdf/1804.08617.pdf)                                            |                                                         ![continuous](https://img.shields.io/badge/-continous-green)                                                         |                                                          [D4PG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/d4pg.html)\u003cbr\u003e[policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py)                                                          |                            python3 -u pendulum_d4pg_config.py                            |\n| 19 |                                           [SAC](https://arxiv.org/abs/1801.01290)/[MASAC]                                           | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green)![MARL](https://img.shields.io/badge/-MARL-yellow) |                                                            [SAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sac.html)\u003cbr\u003e[policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py)                                                            |                      ding -m serial -c pendulum_sac_config.py -s 0                      |\n| 20 |                                            [PDQN](https://arxiv.org/pdf/1810.06394.pdf)                                            |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                    [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py)                                                                                                    |                     ding -m serial -c gym_hybrid_pdqn_config.py -s 0                     |\n| 21 |                                            [MPDQN](https://arxiv.org/pdf/1905.04388.pdf)                                            |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                    [policy/pdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/pdqn.py)                                                                                                    |                    ding -m serial -c gym_hybrid_mpdqn_config.py -s 0                    |\n| 22 |                                            [HPPO](https://arxiv.org/pdf/1903.01344.pdf)                                            |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                     [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py)                                                                                                     |                ding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0                |\n| 23 |                                             [BDQ](https://arxiv.org/pdf/1711.08946.pdf)                                             |                                                           ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                           |                                                                                                     [policy/bdq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py)                                                                                                     |                             python3 -u hopper_bdq_config.py                             |\n| 24 |                                              [MDQN](https://arxiv.org/abs/2007.14430)                                              |                                                        ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                        |                                                                                                    [policy/mdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mdqn.py)                                                                                                    |                            python3 -u asterix_mdqn_config.py                            |\n| 25 |                                            [QMIX](https://arxiv.org/pdf/1803.11485.pdf)                                            |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                          [QMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qmix.html)\u003cbr\u003e[policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py)                                                          |                     ding -m serial -c smac_3s5z_qmix_config.py -s 0                     |\n| 26 |                                            [COMA](https://arxiv.org/pdf/1705.08926.pdf)                                            |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                          [COMA doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/coma.html)\u003cbr\u003e[policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py)                                                          |                     ding -m serial -c smac_3s5z_coma_config.py -s 0                     |\n| 27 |                                              [QTran](https://arxiv.org/abs/1905.05408)                                              |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                                                                   [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py)                                                                                                   |                     ding -m serial -c smac_3s5z_qtran_config.py -s 0                     |\n| 28 |                                              [WQMIX](https://arxiv.org/abs/2006.10800)                                              |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                        [WQMIX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/wqmix.html)\u003cbr\u003e[policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py)                                                        |                     ding -m serial -c smac_3s5z_wqmix_config.py -s 0                     |\n| 29 |                                           [CollaQ](https://arxiv.org/pdf/2010.08531.pdf)                                           |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                      [CollaQ doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/collaq.html)\u003cbr\u003e[policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py)                                                      |                    ding -m serial -c smac_3s5z_collaq_config.py -s 0                    |\n| 30 |                                           [MADDPG](https://arxiv.org/pdf/1706.02275.pdf)                                           |                                                              ![MARL](https://img.shields.io/badge/-MARL-yellow)                                                              |                                                         [MADDPG doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/ddpg.html)\u003cbr\u003e[policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py)                                                         |                ding -m serial -c ptz_simple_spread_maddpg_config.py -s 0                |\n| 31 |                                            [GAIL](https://arxiv.org/pdf/1606.03476.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                               [GAIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/gail.html)\u003cbr\u003e[reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py)                                               |                 ding -m serial_gail -c cartpole_dqn_gail_config.py -s 0                 |\n| 32 |                                            [SQIL](https://arxiv.org/pdf/1905.11108.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                    [SQIL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/sqil.html)\u003cbr\u003e[entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py)                                                    |                   ding -m serial_sqil -c cartpole_sqil_config.py -s 0                   |\n| 33 |                                            [DQFD](https://arxiv.org/pdf/1704.03732.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                          [DQFD doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/dqfd.html)\u003cbr\u003e[policy/dqfd](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqfd.py)                                                          |                   ding -m serial_dqfd -c cartpole_dqfd_config.py -s 0                   |\n| 34 |                                            [R2D3](https://arxiv.org/pdf/1909.01387.pdf)                                            |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |       [R2D3 doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/r2d3.html)\u003cbr\u003e[R2D3中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)\u003cbr\u003e[policy/r2d3](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/r2d3_zh.html)       |                        python3 -u pong_r2d3_r2d2expert_config.py                        |\n| 35 |                                    [Guided Cost Learning](https://arxiv.org/pdf/1603.00448.pdf)                                    |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                      [Guided Cost Learning中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/guided_cost_zh.html)\u003cbr\u003e[reward_model/guided_cost](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/guided_cost_reward_model.py)                      |                            python3 lunarlander_gcl_config.py                            |\n| 36 |                                              [TREX](https://arxiv.org/abs/1904.06387)                                              |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                             [TREX doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/trex.html)\u003cbr\u003e[reward_model/trex](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/trex_reward_model.py)                                             |                               python3 mujoco_trex_main.py                               |\n| 37 |                               [Implicit Behavorial Cloning](https://implicitbc.github.io/) (DFO+MCMC)                               |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                  [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ibc.py) \u003cbr\u003e [model/template/ebm](https://github.com/opendilab/DI-engine/blob/main/ding/model/template/ebm.py)                                                  |              python3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py              |\n| 38 |                                             [BCO](https://arxiv.org/pdf/1805.01954.pdf)                                             |                                                                ![IL](https://img.shields.io/badge/-IL-purple)                                                                |                                                                                                [entry/bco](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_bco.py)                                                                                                |                            python3 -u cartpole_bco_config.py                            |\n| 39 |                                             [HER](https://arxiv.org/pdf/1707.01495.pdf)                                             |                                                           ![exp](https://img.shields.io/badge/-exploration-orange)                                                           |                                               [HER doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/her.html)\u003cbr\u003e[reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py)                                               |                              python3 -u bitflip_her_dqn.py                              |\n| 40 |                                               [RND](https://arxiv.org/abs/1810.12894)                                               |                                                           ![exp](https://img.shields.io/badge/-exploration-orange)                                                           |                                               [RND doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/rnd.html)\u003cbr\u003e[reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py)                                               |                         python3 -u cartpole_rnd_onppo_config.py                         |\n| 41 |                                             [ICM](https://arxiv.org/pdf/1705.05363.pdf)                                             |                                                           ![exp](https://img.shields.io/badge/-exploration-orange)                                                           | [ICM doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/icm.html)\u003cbr\u003e[ICM中文文档](https://di-engine-docs.readthedocs.io/zh_CN/latest/12_policies/icm_zh.html)\u003cbr\u003e[reward_model/icm](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/icm_reward_model.py) |                          python3 -u cartpole_ppo_icm_config.py                          |\n| 42 |                                             [CQL](https://arxiv.org/pdf/2006.04779.pdf)                                             |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                            [CQL doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/cql.html)\u003cbr\u003e[policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py)                                                            |                               python3 -u d4rl_cql_main.py                               |\n| 43 |                                            [TD3BC](https://arxiv.org/pdf/2106.06860.pdf)                                            |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                      [TD3BC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3_bc.html)\u003cbr\u003e[policy/td3_bc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3_bc.py)                                                      |                              python3 -u d4rl_td3_bc_main.py                              |\n| 44 |                                    [Decision Transformer](https://arxiv.org/pdf/2106.01345.pdf)                                    |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                                                                      [policy/dt](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dt.py)                                                                                                      |                               python3 -u d4rl_dt_mujoco.py                               |\n| 45 |                                            [EDAC](https://arxiv.org/pdf/2110.01548.pdf)                                            |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                          [EDAC doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/edac.html)\u003cbr\u003e[policy/edac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/edac.py)                                                          |                               python3 -u d4rl_edac_main.py                               |\n| 46 |                                            [QGPO](https://arxiv.org/pdf/2304.12824.pdf)                                            |                                                         ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                         |                                                          [QGPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/qgpo.html)\u003cbr\u003e[policy/qgpo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qgpo.py)                                                          |                             python3 -u ding/example/qgpo.py                             |\n| 47 |   MBSAC([SAC](https://arxiv.org/abs/1801.01290)+[MVE](https://arxiv.org/abs/1803.00101)+[SVG](https://arxiv.org/abs/1510.09142))   |                           ![continuous](https://img.shields.io/badge/-continous-green)![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                           |                                                                                          [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py)                                                                                          |   python3 -u pendulum_mbsac_mbpo_config.py \\ python3 -u pendulum_mbsac_ddppo_config.py   |\n| 48 | STEVESAC([SAC](https://arxiv.org/abs/1801.01290)+[STEVE](https://arxiv.org/abs/1807.01675)+[SVG](https://arxiv.org/abs/1510.09142)) |                           ![continuous](https://img.shields.io/badge/-continous-green)![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                           |                                                                                          [policy/mbpolicy/mbsac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/mbpolicy/mbsac.py)                                                                                          |                       python3 -u pendulum_stevesac_mbpo_config.py                       |\n| 49 |                                            [MBPO](https://arxiv.org/pdf/1906.08253.pdf)                                            |                                                         ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                                                         |                                                     [MBPO doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/mbpo.html)\u003cbr\u003e[world_model/mbpo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/mbpo.py)                                                     |                          python3 -u pendulum_sac_mbpo_config.py                          |\n| 50 |                                        [DDPPO](https://openreview.net/forum?id=rzvOQrnclO0)                                        |                                                         ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                                                         |                                                                                              [world_model/ddppo](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/ddppo.py)                                                                                              |                        python3 -u pendulum_mbsac_ddppo_config.py                        |\n| 51 |                                          [DreamerV3](https://arxiv.org/pdf/2301.04104.pdf)                                          |                                                         ![mbrl](https://img.shields.io/badge/-ModelBasedRL-lightblue)                                                         |                                                                                          [world_model/dreamerv3](https://github.com/opendilab/DI-engine/blob/main/ding/world_model/dreamerv3.py)                                                                                          |                      python3 -u cartpole_balance_dreamer_config.py                      |\n| 52 |                                             [PER](https://arxiv.org/pdf/1511.05952.pdf)                                             |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                                   [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py)                                                                                   |                                      `rainbow demo`                                      |\n| 53 |                                             [GAE](https://arxiv.org/pdf/1506.02438.pdf)                                             |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                                                   [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py)                                                                                                   |                                        `ppo demo`                                        |\n| 54 |                                           [ST-DIM](https://arxiv.org/pdf/1906.08226.pdf)                                           |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                              [torch_utils/loss/contrastive_loss](https://github.com/opendilab/DI-engine/blob/main/ding/torch_utils/loss/contrastive_loss.py)                                                                              |                   ding -m serial -c cartpole_dqn_stdim_config.py -s 0                   |\n| 55 |                                             [PLR](https://arxiv.org/pdf/2010.03934.pdf)                                             |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                       [PLR doc](https://di-engine-docs.readthedocs.io/en/latest/12_policies/plr.html)\u003cbr\u003e[data/level_replay/level_sampler](https://github.com/opendilab/DI-engine/blob/main/ding/data/level_replay/level_sampler.py)                                       |                          python3 -u bigfish_plr_config.py -s 0                          |\n| 56 |                                           [PCGrad](https://arxiv.org/pdf/2001.06782.pdf)                                           |                                                            ![other](https://img.shields.io/badge/-other-lightgrey)                                                            |                                                                             [torch_utils/optimizer_helper/PCGrad](https://github.com/opendilab/DI-engine/blob/main/ding/data/torch_utils/optimizer_helper.py)                                                                             |                        python3 -u multi_mnist_pcgrad_main.py -s 0                        |\n| 57 |                                           [AWR](https://arxiv.org/pdf/1910.00177)                                                   |                                                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                    |                                                                             [policy/ibc](https://github.com/opendilab/DI-engine/blob/main/ding/policy/prompt_awr.py)                                                                                                                     |                        python3 -u tabmwp_awr_config.py                                   |\n\n\u003c/details\u003e\n\n### Environment Versatility\n\n\u003cdetails open\u003e\n\u003csummary\u003e(Click to Collapse)\u003c/summary\u003e\n\n| No |                                          Environment                                          |                                                                                                                   Label                                                                                                                   |                                             Visualization                                             |                                                                                                                                     Code and Doc Links                                                                                                                                     |\n| :-: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |\n| 1 |               [Atari](https://ale.farama.org)               |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                  ![original](./dizoo/atari/atari.gif)                                  |               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs) \u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/atari.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/atari_zh.html)               |\n| 2 |        [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d)        |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                         ![original](./dizoo/box2d/bipedalwalker/original.gif)                         | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/bipedalwalker.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bipedalwalker_zh.html) |\n| 3 |         [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d)         |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                         ![original](./dizoo/box2d/lunarlander/lunarlander.gif)                         |    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/lunarlander_zh.html)    |\n| 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                       ![original](./dizoo/classic_control/cartpole/cartpole.gif)                       |   [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/cartpole.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/cartpole_zh.html)   |\n| 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                       ![original](./dizoo/classic_control/pendulum/pendulum.gif)                       |   [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pendulum.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pendulum_zh.html)   |\n| 6 |                [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl)                |                                                         ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![selfplay](https://img.shields.io/badge/-selfplay-blue)                                                         |                         ![original](./dizoo/competitive_rl/competitive_rl.gif)                         |                                                     [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/competitive_rl_zh.html)                                                     |\n| 7 |                    [gfootball](https://github.com/google-research/football)                    |                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![sparse](https://img.shields.io/badge/-sparse%20reward-orange)![selfplay](https://img.shields.io/badge/-selfplay-blue)                          |                              ![original](./dizoo/gfootball/gfootball.gif)                              |           [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gfootball_zh.html)           |\n| 8 |                      [minigrid](https://github.com/maximecb/gym-minigrid)                      |                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![sparse](https://img.shields.io/badge/-sparse%20reward-orange)                                                      |                               ![original](./dizoo/minigrid/minigrid.gif)                               |             [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/minigrid_zh.html)             |\n| 9 |              [MuJoCo](https://github.com/openai/gym/tree/master/gym/envs/mujoco)              |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                                 ![original](./dizoo/mujoco/mujoco.gif)                                 |                [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/en/latest/13_envs/mujoco_zh.html)                |\n| 10 |                 [PettingZoo](https://github.com/Farama-Foundation/PettingZoo)                 |                              ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![continuous](https://img.shields.io/badge/-continous-green) ![marl](https://img.shields.io/badge/-MARL-yellow)                              |                   ![original](./dizoo/petting_zoo/petting_zoo_mpe_simple_spread.gif)                   |        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/petting_zoo/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/pettingzoo.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pettingzoo_zh.html)        |\n| 11 |               [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo)               |                                                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)                                                            |                             ![original](./dizoo/overcooked/overcooked.gif)                             |                                                       [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/overcooked.html)                                                       |\n| 12 |                          [procgen](https://github.com/openai/procgen)                          |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/procgen/coinrun.gif)                                |               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/procgen.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/procgen_zh.html)               |\n| 13 |                      [pybullet](https://github.com/benelot/pybullet-gym)                      |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                               ![original](./dizoo/pybullet/pybullet.gif)                               |                                                        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/pybullet_zh.html)                                                        |\n| 14 |                            [smac](https://github.com/oxwhirl/smac)                            | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)![selfplay](https://img.shields.io/badge/-selfplay-blue)![sparse](https://img.shields.io/badge/-sparse%20reward-orange) |                                   ![original](./dizoo/smac/smac.gif)                                   |                 [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/smac.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/smac_zh.html)                 |\n| 15 |                         [d4rl](https://github.com/rail-berkeley/d4rl)                         |                                                                                       ![offline](https://img.shields.io/badge/-offlineRL-darkblue)                                                                                       |                                      ![ori](dizoo/d4rl/d4rl.gif)                                      |                                                              [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/d4rl_zh.html)                                                              |\n| 16 |                                          league_demo                                          |                                                         ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![selfplay](https://img.shields.io/badge/-selfplay-blue)                                                         |                            ![original](./dizoo/league_demo/league_demo.png)                            |                                                                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs)                                                                                                    |\n| 17 |                                          pomdp atari                                          |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                                                                                        |                                                                                                       [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs)                                                                                                       |\n| 18 |                          [bsuite](https://github.com/deepmind/bsuite)                          |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                 ![original](./dizoo/bsuite/bsuite.png)                                 |             [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs//bsuite.html) \u003cbr\u003e [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bsuite_zh.html)             |\n| 19 |                             [ImageNet](https://www.image-net.org/)                             |                                                                                             ![IL](https://img.shields.io/badge/-IL/SL-purple)                                                                                             |                         ![original](./dizoo/image_classification/imagenet.png)                         |                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/image_cls_zh.html)                                                    |\n| 20 |                 [slime_volleyball](https://github.com/hardmaru/slimevolleygym)                 |                                                          ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![selfplay](https://img.shields.io/badge/-selfplay-blue)                                                          |                              ![ori](dizoo/slime_volley/slime_volley.gif)                              |    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/slime_volleyball.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/slime_volleyball_zh.html)    |\n| 21 |                    [gym_hybrid](https://github.com/thomashirtz/gym-hybrid)                    |                                                                                         ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                                                         |                                 ![ori](dizoo/gym_hybrid/moving_v0.gif)                                 |           [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_hybrid)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_hybrid.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_hybrid_zh.html)           |\n| 22 |                       [GoBigger](https://github.com/opendilab/GoBigger)                       |                                    ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)![marl](https://img.shields.io/badge/-MARL-yellow)![selfplay](https://img.shields.io/badge/-selfplay-blue)                                    |                                 ![ori](./dizoo/gobigger_overview.gif)                                 |                                [dizoo link](https://github.com/opendilab/GoBigger-Challenge-2021/tree/main/di_baseline)\u003cbr\u003e[env tutorial](https://gobigger.readthedocs.io/en/latest/index.html)\u003cbr\u003e[环境指南](https://gobigger.readthedocs.io/zh_CN/latest/)                                |\n| 23 |                       [gym_soccer](https://github.com/openai/gym-soccer)                       |                                                                                         ![hybrid](https://img.shields.io/badge/-hybrid-darkgreen)                                                                                         |                              ![ori](dizoo/gym_soccer/half_offensive.gif)                              |                                                        [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_soccer)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_soccer_zh.html)                                                        |\n| 24 |           [multiagent_mujoco](https://github.com/schroederdewitt/multiagent_mujoco)           |                                                              ![continuous](https://img.shields.io/badge/-continous-green) ![marl](https://img.shields.io/badge/-MARL-yellow)                                                              |                                 ![original](./dizoo/mujoco/mujoco.gif)                                 |                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_mujoco/envs)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/mujoco_zh.html)                                                    |\n| 25 |                                            bitflip                                            |                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![sparse](https://img.shields.io/badge/-sparse%20reward-orange)                                                      |                                ![original](./dizoo/bitflip/bitflip.gif)                                |                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bitflip/envs)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/bitflip_zh.html)                                                         |\n| 26 |                      [sokoban](https://github.com/mpSchrader/gym-sokoban)                      |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      | ![Game 2](https://github.com/mpSchrader/gym-sokoban/raw/default/docs/Animations/solved_4.gif?raw=true) |             [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/sokoban/envs)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/sokoban.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/sokoban_zh.html)             |\n| 27 |                   [gym_anytrading](https://github.com/AminHP/gym-anytrading)                   |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                         ![original](./dizoo/gym_anytrading/envs/position.png)                         |                                                [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_anytrading) \u003cbr\u003e [env tutorial](https://github.com/opendilab/DI-engine/blob/main/dizoo/gym_anytrading/envs/README.md)                                                |\n| 28 |                   [mario](https://github.com/Kautenja/gym-super-mario-bros)                   |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                  ![original](./dizoo/mario/mario.gif)                                  |  [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/mario) \u003cbr\u003e [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/gym_super_mario_bros.html) \u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/gym_super_mario_bros_zh.html)  |\n| 29 |                       [dmc2gym](https://github.com/denisyarats/dmc2gym)                       |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                            ![original](./dizoo/dmc2gym/dmc2gym_cheetah.png)                            |               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/dmc2gym)\u003cbr\u003e[env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/dmc2gym.html)\u003cbr\u003e[环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/dmc2gym_zh.html)               |\n| 30 |                        [evogym](https://github.com/EvolutionGym/evogym)                        |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                                 ![original](./dizoo/evogym/evogym.gif)                                 |            [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/evogym/envs) \u003cbr\u003e [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/evogym.html) \u003cbr\u003e [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/Evogym_zh.html)            |\n| 31 |             [gym-pybullet-drones](https://github.com/utiasDSL/gym-pybullet-drones)             |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                    ![original](./dizoo/gym_pybullet_drones/gym_pybullet_drones.gif)                    |                                                                                          [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/gym_pybullet_drones/envs)\u003cbr\u003e环境指南                                                                                          |\n| 32 |                 [beergame](https://github.com/OptMLGroup/DeepBeerInventory-RL)                 |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                               ![original](./dizoo/beergame/beergame.png)                               |                                                                                               [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/beergame/envs)\u003cbr\u003e环境指南                                                                                               |\n| 33 | [classic_control/acrobot](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                        ![original](./dizoo/classic_control/acrobot/acrobot.gif)                        |                                                [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/acrobot/envs)\u003cbr\u003e [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/acrobot_zh.html)                                                |\n| 34 |   [box2d/car_racing](https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py)   |                                                     ![discrete](https://img.shields.io/badge/-discrete-brightgreen) \u003cbr\u003e ![continuous](https://img.shields.io/badge/-continous-green)                                                     |                          ![original](./dizoo/box2d/carracing/car_racing.gif)                          |                                                                                            [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/carracing/envs)\u003cbr\u003e环境指南                                                                                            |\n| 35 |                     [metadrive](https://github.com/metadriverse/metadrive)                     |                                                                                       ![continuous](https://img.shields.io/badge/-continous-green)                                                                                       |                            ![original](./dizoo/metadrive/metadrive_env.gif)                            |                                                       [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/metadrive/env)\u003cbr\u003e [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/metadrive_zh.html)                                                       |\n| 36 |  [cliffwalking](https://github.com/openai/gym/blob/master/gym/envs/toy_text/cliffwalking.py)  |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                          ![original](./dizoo/cliffwalking/cliff_walking.gif)                          |                                                                                    [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/cliffwalking/envs)\u003cbr\u003e env tutorial \u003cbr\u003e 环境指南                                                                                    |\n| 37 |                       [tabmwp](https://promptpg.github.io/explore.html)                       |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/tabmwp/tabmwp.jpeg)                                |                                                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/tabmwp) \u003cbr\u003e env tutorial \u003cbr\u003e 环境指南                                                                                         |\n| 38 |            [frozen_lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake)      |                                                                                      ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                      |                                ![original](./dizoo/frozen_lake/FrozenLake.gif)                        |                                                                                         [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/frozen_lake) \u003cbr\u003e env tutorial \u003cbr\u003e 环境指南                                                                                         |\n| 39 | [ising_model](https://github.com/mlii/mfrl/tree/master/examples/ising_model)                  |                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen) ![marl](https://img.shields.io/badge/-MARL-yellow)                                                                                             |                                ![original](./dizoo/ising_env/ising_env.gif)                           | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/ising_env) \u003cbr\u003e env tutorial \u003cbr\u003e [环境指南](https://di-engine-docs.readthedocs.io/zh_CN/latest/13_envs/ising_model_zh.html) |\n| 40 | [taxi](https://www.gymlibrary.dev/environments/toy_text/taxi/)                  |                            ![discrete](https://img.shields.io/badge/-discrete-brightgreen)                                                                                 |                                ![original](./dizoo/taxi/Taxi-v3_episode_0.gif)                           | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/taxi/envs) \u003cbr\u003e [env tutorial](https://di-engine-docs.readthedocs.io/en/latest/13_envs/taxi.html) \u003cbr\u003e [环境指南](https://di-engine-docs.readthedocs.io/zh-cn/latest/13_envs/taxi_zh.html) |\n\n\n\n![discrete](https://img.shields.io/badge/-discrete-brightgreen) means discrete action space\n\n![continuous](https://img.shields.io/badge/-continous-green) means continuous action space\n\n![hybrid](https://img.shields.io/badge/-hybrid-darkgreen) means hybrid (discrete + continuous) action space\n\n![MARL](https://img.shields.io/badge/-MARL-yellow) means multi-agent RL environment\n\n![sparse](https://img.shields.io/badge/-sparse%20reward-orange) means environment which is related to exploration and sparse reward\n\n![offline](https://img.shields.io/badge/-offlineRL-darkblue) means offline RL environment\n\n![IL](https://img.shields.io/badge/-IL/SL-purple) means Imitation Learning or Supervised Learning Dataset\n\n![selfplay](https://img.shields.io/badge/-selfplay-blue) means environment that allows agent VS agent battle\n\nP.S. some enviroments in Atari, such as **MontezumaRevenge**, are also the sparse reward type.\n\n\u003c/details\u003e\n\n### General Data Container: TreeTensor\n\nDI-engine utilizes [TreeTensor](https://github.com/opendilab/DI-treetensor) as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:\n\n- TreeTensor can easily extend all the operations of `torch.Tensor` to nested data:\n\n  \u003cdetails close\u003e\n  \u003csummary\u003e(Click for Details)\u003c/summary\u003e\n\n  ```python\n  import treetensor.torch as ttorch\n\n\n  # create random tensor\n  data = ttorch.randn({'a': (3, 2), 'b': {'c': (3, )}})\n  # clone+detach tensor\n  data_clone = data.clone().detach()\n  # access tree structure like attribute\n  a = data.a\n  c = data.b.c\n  # stack/cat/split\n  stacked_data = ttorch.stack([data, data_clone], 0)\n  cat_data = ttorch.cat([data, data_clone], 0)\n  data, data_clone = ttorch.split(stacked_data, 1)\n  # reshape\n  data = data.unsqueeze(-1)\n  data = data.squeeze(-1)\n  flatten_data = data.view(-1)\n  # indexing\n  data_0 = data[0]\n  data_1to2 = data[1:2]\n  # execute math calculations\n  data = data.sin()\n  data.b.c.cos_().clamp_(-1, 1)\n  data += data ** 2\n  # backward\n  data.requires_grad_(True)\n  loss = data.arctan().mean()\n  loss.backward()\n  # print shape\n  print(data.shape)\n  # result\n  # \u003cSize 0x7fbd3346ddc0\u003e\n  # ├── 'a' --\u003e torch.Size([1, 3, 2])\n  # └── 'b' --\u003e \u003cSize 0x7fbd3346dd00\u003e\n  #     └── 'c' --\u003e torch.Size([1, 3])\n  ```\n\n  \u003c/details\u003e\n- TreeTensor can make it simple yet effective to implement classic deep reinforcement learning pipeline\n\n  \u003cdetails close\u003e\n  \u003csummary\u003e(Click for Details)\u003c/summary\u003e\n\n  ```diff\n  import torch\n  import treetensor.torch as ttorch\n\n  B = 4\n\n\n  def get_item():\n      return {\n          'obs': {\n              'scalar': torch.randn(12),\n              'image': torch.randn(3, 32, 32),\n          },\n          'action': torch.randint(0, 10, size=(1,)),\n          'reward': torch.rand(1),\n          'done': False,\n      }\n\n\n  data = [get_item() for _ in range(B)]\n\n\n  # execute `stack` op\n  - def stack(data, dim):\n  -     elem = data[0]\n  -     if isinstance(elem, torch.Tensor):\n  -         return torch.stack(data, dim)\n  -     elif isinstance(elem, dict):\n  -         return {k: stack([item[k] for item in data], dim) for k in elem.keys()}\n  -     elif isinstance(elem, bool):\n  -         return torch.BoolTensor(data)\n  -     else:\n  -         raise TypeError(\"not support elem type: {}\".format(type(elem)))\n  - stacked_data = stack(data, dim=0)\n  + data = [ttorch.tensor(d) for d in data]\n  + stacked_data = ttorch.stack(data, dim=0)\n\n  # validate\n  - assert stacked_data['obs']['image'].shape == (B, 3, 32, 32)\n  - assert stacked_data['action'].shape == (B, 1)\n  - assert stacked_data['reward'].shape == (B, 1)\n  - assert stacked_data['done'].shape == (B,)\n  - assert stacked_data['done'].dtype == torch.bool\n  + assert stacked_data.obs.image.shape == (B, 3, 32, 32)\n  + assert stacked_data.action.shape == (B, 1)\n  + assert stacked_data.reward.shape == (B, 1)\n  + assert stacked_data.done.shape == (B,)\n  + assert stacked_data.done.dtype == torch.bool\n  ```\n\n  \u003c/details\u003e\n\n## Feedback and Contribution\n\n- [File an issue](https://github.com/opendilab/DI-engine/issues/new/choose) on Github\n- Open or participate in our [forum](https://github.com/opendilab/DI-engine/discussions)\n- Discuss on DI-engine [discord server](https://discord.gg/dkZS2JF56X)\n- Discuss on DI-engine [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ)\n- Discuss on DI-engine's WeChat group (i.e. add us on WeChat: ding314assist)\n\n  \u003cimg src=https://github.com/opendilab/DI-engine/blob/main/assets/wechat.jpeg width=35% /\u003e\n- Contact our email (opendilab@pjlab.org.cn)\n- Contributes to our future plan [Roadmap](https://github.com/opendilab/DI-engine/issues/548)\n\nWe appreciate all the feedbacks and contributions to improve DI-engine, both algorithms and system designs. And `CONTRIBUTING.md` offers some necessary information.\n\n## Supporters\n\n### \u0026#8627; Stargazers\n\n[![Stargazers repo roster for @opendilab/DI-engine](https://reporoster.com/stars/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/stargazers)\n\n### \u0026#8627; Forkers\n\n[![Forkers repo roster for @opendilab/DI-engine](https://reporoster.com/forks/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/network/members)\n\n## Citation\n\n```latex\n@misc{ding,\n    title={DI-engine: A Universal AI System/Engine for Decision Intelligence},\n    author={Niu, Yazhe and Xu, Jingxin and Pu, Yuan and Nie, Yunpeng and Zhang, Jinouwen and Hu, Shuai and Zhao, Liangxuan and Zhang,  Ming and Liu, Yu},\n    publisher={GitHub},\n    howpublished={\\url{https://github.com/opendilab/DI-engine}},\n    year={2021},\n}\n```\n\n## License\n\nDI-engine released under the Apache 2.0 license.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopendilab%2Fdi-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopendilab%2Fdi-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopendilab%2Fdi-engine/lists"}