{"id":13484187,"url":"https://github.com/danijar/dreamerv2","last_synced_at":"2025-04-08T11:07:49.655Z","repository":{"id":39991322,"uuid":"318652090","full_name":"danijar/dreamerv2","owner":"danijar","description":"Mastering Atari with Discrete World Models","archived":false,"fork":false,"pushed_at":"2023-01-21T07:43:42.000Z","size":2222,"stargazers_count":928,"open_issues_count":10,"forks_count":198,"subscribers_count":27,"default_branch":"main","last_synced_at":"2025-04-01T10:09:29.117Z","etag":null,"topics":["artificial-intelligence","atari","deep-learning","machine-learning","reinforcement-learning","research","robotics","video-prediction","world-models"],"latest_commit_sha":null,"homepage":"https://danijar.com/dreamerv2","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danijar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-04T22:44:45.000Z","updated_at":"2025-04-01T02:09:03.000Z","dependencies_parsed_at":"2023-02-12T08:20:28.709Z","dependency_job_id":null,"html_url":"https://github.com/danijar/dreamerv2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danijar%2Fdreamerv2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danijar%2Fdreamerv2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danijar%2Fdreamerv2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danijar%2Fdreamerv2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danijar","download_url":"https://codeload.github.com/danijar/dreamerv2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247829491,"owners_count":21002995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","atari","deep-learning","machine-learning","reinforcement-learning","research","robotics","video-prediction","world-models"],"created_at":"2024-07-31T17:01:20.380Z","updated_at":"2025-04-08T11:07:49.639Z","avatar_url":"https://github.com/danijar.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"**Status:** Stable release\n\n[![PyPI](https://img.shields.io/pypi/v/dreamerv2.svg)](https://pypi.python.org/pypi/dreamerv2/#history)\n\n# Mastering Atari with Discrete World Models\n\nImplementation of the [DreamerV2][website] agent in TensorFlow 2. Training\ncurves for all 55 games are included.\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"90%\" src=\"https://imgur.com/gO1rvEn.gif\"\u003e\n\u003c/p\u003e\n\nIf you find this code useful, please reference in your paper:\n\n```\n@article{hafner2020dreamerv2,\n  title={Mastering Atari with Discrete World Models},\n  author={Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy},\n  journal={arXiv preprint arXiv:2010.02193},\n  year={2020}\n}\n```\n\n[website]: https://danijar.com/dreamerv2\n\n## Method\n\nDreamerV2 is the first world model agent that achieves human-level performance\non the Atari benchmark. DreamerV2 also outperforms the final performance of the\ntop model-free agents Rainbow and IQN using the same amount of experience and\ncomputation. The implementation in this repository alternates between training\nthe world model, training the policy, and collecting experience and runs on a\nsingle GPU.\n\n![World Model Learning](https://imgur.com/GRC9QAw.png)\n\nDreamerV2 learns a model of the environment directly from high-dimensional\ninput images. For this, it predicts ahead using compact learned states. The\nstates consist of a deterministic part and several categorical variables that\nare sampled. The prior for these categoricals is learned through a KL loss. The\nworld model is learned end-to-end via straight-through gradients, meaning that\nthe gradient of the density is set to the gradient of the sample.\n\n![Actor Critic Learning](https://imgur.com/wH9kJ2O.png)\n\nDreamerV2 learns actor and critic networks from imagined trajectories of latent\nstates. The trajectories start at encoded states of previously encountered\nsequences. The world model then predicts ahead using the selected actions and\nits learned state prior. The critic is trained using temporal difference\nlearning and the actor is trained to maximize the value function via reinforce\nand straight-through gradients.\n\nFor more information:\n\n- [Google AI Blog post](https://ai.googleblog.com/2021/02/mastering-atari-with-discrete-world.html)\n- [Project website](https://danijar.com/dreamerv2/)\n- [Research paper](https://arxiv.org/pdf/2010.02193.pdf)\n\n## Using the Package\n\nThe easiest way to run DreamerV2 on new environments is to install the package\nvia `pip3 install dreamerv2`. The code automatically detects whether the\nenvironment uses discrete or continuous actions. Here is a usage example that\ntrains DreamerV2 on the MiniGrid environment:\n\n```python\nimport gym\nimport gym_minigrid\nimport dreamerv2.api as dv2\n\nconfig = dv2.defaults.update({\n    'logdir': '~/logdir/minigrid',\n    'log_every': 1e3,\n    'train_every': 10,\n    'prefill': 1e5,\n    'actor_ent': 3e-3,\n    'loss_scales.kl': 1.0,\n    'discount': 0.99,\n}).parse_flags()\n\nenv = gym.make('MiniGrid-DoorKey-6x6-v0')\nenv = gym_minigrid.wrappers.RGBImgPartialObsWrapper(env)\ndv2.train(env, config)\n```\n\n## Manual Instructions\n\nTo modify the DreamerV2 agent, clone the repository and follow the instructions\nbelow. There is also a Dockerfile available, in case you do not want to install\nthe dependencies on your system.\n\nGet dependencies:\n\n```sh\npip3 install tensorflow==2.6.0 tensorflow_probability ruamel.yaml 'gym[atari]' dm_control\n```\n\nTrain on Atari:\n\n```sh\npython3 dreamerv2/train.py --logdir ~/logdir/atari_pong/dreamerv2/1 \\\n  --configs atari --task atari_pong\n```\n\nTrain on DM Control:\n\n```sh\npython3 dreamerv2/train.py --logdir ~/logdir/dmc_walker_walk/dreamerv2/1 \\\n  --configs dmc_vision --task dmc_walker_walk\n```\n\nMonitor results:\n\n```sh\ntensorboard --logdir ~/logdir\n```\n\nGenerate plots:\n\n```sh\npython3 common/plot.py --indir ~/logdir --outdir ~/plots \\\n  --xaxis step --yaxis eval_return --bins 1e6\n```\n\n## Docker Instructions\n\nThe [Dockerfile](https://github.com/danijar/dreamerv2/blob/main/Dockerfile)\nlets you run DreamerV2 without installing its dependencies in your system. This\nrequires you to have Docker with GPU access set up.\n\nCheck your setup:\n\n```sh\ndocker run -it --rm --gpus all tensorflow/tensorflow:2.4.2-gpu nvidia-smi\n```\n\nTrain on Atari:\n\n```sh\ndocker build -t dreamerv2 .\ndocker run -it --rm --gpus all -v ~/logdir:/logdir dreamerv2 \\\n  python3 dreamerv2/train.py --logdir /logdir/atari_pong/dreamerv2/1 \\\n    --configs atari --task atari_pong\n```\n\nTrain on DM Control:\n\n```sh\ndocker build -t dreamerv2 . --build-arg MUJOCO_KEY=\"$(cat ~/.mujoco/mjkey.txt)\"\ndocker run -it --rm --gpus all -v ~/logdir:/logdir dreamerv2 \\\n  python3 dreamerv2/train.py --logdir /logdir/dmc_walker_walk/dreamerv2/1 \\\n    --configs dmc_vision --task dmc_walker_walk\n```\n\n## Tips\n\n- **Efficient debugging.** You can use the `debug` config as in `--configs\natari debug`. This reduces the batch size, increases the evaluation\nfrequency, and disables `tf.function` graph compilation for easy line-by-line\ndebugging.\n\n- **Infinite gradient norms.** This is normal and described under loss scaling in\nthe [mixed precision][mixed] guide. You can disable mixed precision by passing\n`--precision 32` to the training script. Mixed precision is faster but can in\nprinciple cause numerical instabilities.\n\n- **Accessing logged metrics.** The metrics are stored in both TensorBoard and\nJSON lines format. You can directly load them using `pandas.read_json()`. The\nplotting script also stores the binned and aggregated metrics of multiple runs\ninto a single JSON file for easy manual plotting.\n\n[mixed]: https://www.tensorflow.org/guide/mixed_precision\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanijar%2Fdreamerv2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanijar%2Fdreamerv2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanijar%2Fdreamerv2/lists"}