{"id":29552032,"url":"https://github.com/nicklashansen/tdmpc2","last_synced_at":"2025-07-18T05:02:29.761Z","repository":{"id":203646578,"uuid":"710079101","full_name":"nicklashansen/tdmpc2","owner":"nicklashansen","description":"Code for \"TD-MPC2: Scalable, Robust World Models for Continuous Control\"","archived":false,"fork":false,"pushed_at":"2025-05-21T23:06:56.000Z","size":6067,"stargazers_count":539,"open_issues_count":6,"forks_count":120,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-05-22T00:22:01.692Z","etag":null,"topics":["reinforcement-learning","robotics","world-model"],"latest_commit_sha":null,"homepage":"https://www.tdmpc2.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicklashansen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-26T01:23:45.000Z","updated_at":"2025-05-21T23:32:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"ad88f09a-7613-423b-a822-b75bfd76ba0c","html_url":"https://github.com/nicklashansen/tdmpc2","commit_stats":{"total_commits":27,"total_committers":2,"mean_commits":13.5,"dds":0.03703703703703709,"last_synced_commit":"01cdf0f799712713f1729ae17a5c5f2053e49582"},"previous_names":["nicklashansen/tdmpc2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nicklashansen/tdmpc2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicklashansen%2Ftdmpc2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicklashansen%2Ftdmpc2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicklashansen%2Ftdmpc2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicklashansen%2Ftdmpc2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicklashansen","download_url":"https://codeload.github.com/nicklashansen/tdmpc2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicklashansen%2Ftdmpc2/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265703046,"owners_count":23813914,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["reinforcement-learning","robotics","world-model"],"created_at":"2025-07-18T05:01:09.038Z","updated_at":"2025-07-18T05:02:29.733Z","avatar_url":"https://github.com/nicklashansen.png","language":"Python","funding_links":[],"categories":["4. 算法"],"sub_categories":["4.2 Reinforcement Learning"],"readme":"\u003ch1\u003eTD-MPC2\u003c/span\u003e\u003c/h1\u003e\n\nOfficial implementation of\n\n[TD-MPC2: Scalable, Robust World Models for Continuous Control](https://www.tdmpc2.com) by\n\n[Nicklas Hansen](https://nicklashansen.github.io), [Hao Su](https://cseweb.ucsd.edu/~haosu)\\*, [Xiaolong Wang](https://xiaolonw.github.io)\\* (UC San Diego)\u003c/br\u003e\n\n\u003cimg src=\"assets/0.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/1.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/2.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/3.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/4.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/5.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/6.gif\" width=\"12.5%\"\u003e\u003cimg src=\"assets/7.gif\" width=\"12.5%\"\u003e\u003c/br\u003e\n\n[[Website]](https://www.tdmpc2.com) [[Paper]](https://arxiv.org/abs/2310.16828) [[Models]](https://www.tdmpc2.com/models)  [[Dataset]](https://www.tdmpc2.com/dataset)\n\n----\n\n**Announcement (Apr 2025): support for episodic tasks!**\n\nWe have added support for episodic RL (tasks with terminations) in the latest release. This functionality can be enabled with `episodic=true` but remains disabled by default to ensure reproducibility of results across releases.\n\n----\n\n\n## Overview\n\nTD-MPC**2** is a scalable, robust model-based reinforcement learning algorithm. It compares favorably to existing model-free and model-based methods across **104** continuous control tasks spanning multiple domains, with a *single* set of hyperparameters (*right*). We further demonstrate the scalability of TD-MPC**2** by training a single 317M parameter agent to perform **80** tasks across multiple domains, embodiments, and action spaces (*left*). \n\n\u003cimg src=\"assets/8.png\" width=\"100%\" style=\"max-width: 640px\"\u003e\u003cbr/\u003e\n\nThis repository contains code for training and evaluating both single-task online RL and multi-task offline RL TD-MPC**2** agents. We additionally open-source **300+** [model checkpoints](https://www.tdmpc2.com/models) (including 12 multi-task models) across 4 task domains: [DMControl](https://arxiv.org/abs/1801.00690), [Meta-World](https://meta-world.github.io/), [ManiSkill2](https://maniskill2.github.io/), and [MyoSuite](https://sites.google.com/view/myosuite), as well as our [30-task and 80-task datasets](https://www.tdmpc2.com/dataset) used to train the multi-task models. Our codebase supports both state and pixel observations. We hope that this repository will serve as a useful community resource for future research on model-based RL.\n\n----\n\n## Getting started\n\nYou will need a machine with a GPU and at least 12 GB of RAM for single-task online RL with TD-MPC**2**, and 128 GB of RAM for multi-task offline RL on our provided 80-task dataset. A GPU with at least 8 GB of memory is recommended for single-task online RL and for evaluation of the provided multi-task models (up to 317M parameters). Training of the 317M parameter model requires a GPU with at least 24 GB of memory.\n\nWe provide a `Dockerfile` for easy installation. You can build the docker image by running\n\n```\ncd docker \u0026\u0026 docker build . -t \u003cuser\u003e/tdmpc2:1.0.1\n```\n\nThis docker image contains all dependencies needed for running DMControl. We also provide a pre-built docker image [here](https://hub.docker.com/repository/docker/nicklashansen/tdmpc2/tags/1.0.1/sha256-b07d4e04d4b28ffd9a63ac18ec1541950e874bb51d276c7d09b36135f170dd93).\n\nIf you prefer to use `conda` rather than docker, start by running the following command:\n\n```\nconda env create -f docker/environment.yaml\n```\n\nThe `docker/environment.yaml` file installs dependencies required for training on DMControl tasks. Other domains can be installed by following the instructions in `docker/environment.yaml`.\n\nIf you want to run ManiSkill2, you will additionally need to download and link the necessary assets by running\n\n```\npython -m mani_skill2.utils.download_asset all\n```\n\nwhich downloads assets to `./data`. You may move these assets to any location. Then, add the following line to your `~/.bashrc`:\n\n```\nexport MS2_ASSET_DIR=\u003cpath\u003e/\u003cto\u003e/\u003cdata\u003e\n```\n\nand restart your terminal. Note that Meta-World requires MuJoCo 2.1.0 and `gym==0.21.0` which is becoming increasingly difficult to install. We host the unrestricted MuJoCo 2.1.0 license (courtesy of Google DeepMind) at [https://www.tdmpc2.com/files/mjkey.txt](https://www.tdmpc2.com/files/mjkey.txt). You can download the license by running\n\n```\nwget https://www.tdmpc2.com/files/mjkey.txt -O ~/.mujoco/mjkey.txt\n```\n\nDepending on your existing system packages, you may need to install other dependencies. See `docker/Dockerfile` for a list of recommended system packages.\n\n----\n\n## Supported tasks\n\nThis codebase provides support for all **104** continuous control tasks from **DMControl**, **Meta-World**, **ManiSkill2**, and **MyoSuite** used in our paper. Specifically, it supports 39 tasks from DMControl (including 11 custom tasks), 50 tasks from Meta-World, 5 tasks from ManiSkill2, and 10 tasks from MyoSuite, and covers all tasks used in the paper. See below table for expected name formatting for each task domain:\n\n| domain | task\n| --- | --- |\n| dmcontrol | dog-run\n| dmcontrol | cheetah-run-backwards\n| metaworld | mw-assembly\n| metaworld | mw-pick-place-wall\n| maniskill | pick-cube\n| maniskill | pick-ycb\n| myosuite  | myo-key-turn\n| myosuite  | myo-key-turn-hard\n\nwhich can be run by specifying the `task` argument for `evaluation.py`. Multi-task training and evaluation is specified by setting `task=mt80` or `task=mt30` for the 80-task and 30-task sets, respectively. While you generally do not need to access the underlying task IDs or embeddings during training or evaluation of our multi-task models, the mapping from task name to task embedding used in our work can be found [here](https://github.com/nicklashansen/tdmpc2/blob/7ec6bc83a82a5188ca3faddc59aea83f430ab570/tdmpc2/common/__init__.py#L26). As of April 2025, our codebase also provides basic support for other MuJoCo/Box2d Gymnasium tasks; refer to the `envs` directory for a list of tasks. It should be relatively straightforward to add support for custom tasks by following the examples in `envs`.\n\n**Note:** we also provide support for image observations in the DMControl tasks. Use argument `obs=rgb` if you wish to train visual policies.\n\n\n## Example usage\n\nWe provide examples on how to evaluate our provided TD-MPC**2** checkpoints, as well as how to train your own TD-MPC**2** agents, below.\n\n### Evaluation\n\nSee below examples on how to evaluate downloaded single-task and multi-task checkpoints.\n\n```\n$ python evaluate.py task=mt80 model_size=48 checkpoint=/path/to/mt80-48M.pt\n$ python evaluate.py task=mt30 model_size=317 checkpoint=/path/to/mt30-317M.pt\n$ python evaluate.py task=dog-run checkpoint=/path/to/dog-1.pt save_video=true\n```\n\nAll single-task checkpoints expect `model_size=5`. Multi-task checkpoints are available in multiple model sizes. Available arguments are `model_size={1, 5, 19, 48, 317}`. Note that single-task evaluation of multi-task checkpoints is currently not supported. See `config.yaml` for a full list of arguments.\n\n### Training\n\nSee below examples on how to train TD-MPC**2** on a single task (online RL) and on multi-task datasets (offline RL). We recommend configuring [Weights and Biases](https://wandb.ai) (`wandb`) in `config.yaml` to track training progress.\n\n```\n$ python train.py task=mt80 model_size=48 batch_size=1024\n$ python train.py task=mt30 model_size=317 batch_size=1024\n$ python train.py task=dog-run steps=7000000\n$ python train.py task=walker-walk obs=rgb\n```\n\nWe recommend using default hyperparameters for single-task online RL, including the default model size of 5M parameters (`model_size=5`). Multi-task offline RL benefits from a larger model size, but larger models are also increasingly costly to train and evaluate. Available arguments are `model_size={1, 5, 19, 48, 317}`. See `config.yaml` for a full list of arguments.\n\n----\n\n## Citation\n\nIf you find our work useful, please consider citing our paper as follows:\n\n```\n@inproceedings{hansen2024tdmpc2,\n  title={TD-MPC2: Scalable, Robust World Models for Continuous Control}, \n  author={Nicklas Hansen and Hao Su and Xiaolong Wang},\n  booktitle={International Conference on Learning Representations (ICLR)},\n  year={2024}\n}\n```\nas well as the original TD-MPC paper:\n```\n@inproceedings{hansen2022tdmpc,\n  title={Temporal Difference Learning for Model Predictive Control},\n  author={Nicklas Hansen and Xiaolong Wang and Hao Su},\n  booktitle={International Conference on Machine Learning (ICML)},\n  year={2022}\n}\n```\n\n----\n\n## Contributing\n\nYou are very welcome to contribute to this project. Feel free to open an issue or pull request if you have any suggestions or bug reports, but please review our [guidelines](CONTRIBUTING.md) first. Our goal is to build a codebase that can easily be extended to new environments and tasks, and we would love to hear about your experience!\n\n----\n\n## License\n\nThis project is licensed under the MIT License - see the `LICENSE` file for details. Note that the repository relies on third-party code, which is subject to their respective licenses.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicklashansen%2Ftdmpc2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicklashansen%2Ftdmpc2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicklashansen%2Ftdmpc2/lists"}