{"id":26856462,"url":"https://github.com/qiwang067/LS-Imagine","last_synced_at":"2025-03-31T00:02:57.699Z","repository":{"id":261375300,"uuid":"867680919","full_name":"qiwang067/LS-Imagine","owner":"qiwang067","description":"[ICLR 2025 Oral] PyTorch code for the paper \"Open-World Reinforcement Learning over Long Short-Term Imagination\"","archived":false,"fork":false,"pushed_at":"2025-03-13T05:51:29.000Z","size":3766,"stargazers_count":36,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-13T06:29:40.603Z","etag":null,"topics":["minecraft","minedojo","reinforcement-learning","rl","visual-reinforcement-learning","visual-rl","world-model"],"latest_commit_sha":null,"homepage":"https://qiwang067.github.io/ls-imagine","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qiwang067.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-04T14:13:16.000Z","updated_at":"2025-03-13T05:51:32.000Z","dependencies_parsed_at":"2025-01-17T17:28:34.666Z","dependency_job_id":"5f50a550-d9b0-4f80-abcb-236c91d1683c","html_url":"https://github.com/qiwang067/LS-Imagine","commit_stats":null,"previous_names":["qiwang067/ls-imagine"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qiwang067%2FLS-Imagine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qiwang067%2FLS-Imagine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qiwang067%2FLS-Imagine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qiwang067%2FLS-Imagine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qiwang067","download_url":"https://codeload.github.com/qiwang067/LS-Imagine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246395595,"owners_count":20770243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["minecraft","minedojo","reinforcement-learning","rl","visual-reinforcement-learning","visual-rl","world-model"],"created_at":"2025-03-31T00:02:21.060Z","updated_at":"2025-03-31T00:02:57.687Z","avatar_url":"https://github.com/qiwang067.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003ch1 align=\"center\"\u003e\n [ICLR 2025 Oral] \u003cimg src=\"./assets/minecraft.png\" alt=\"logo\" style=\"width: 32px; height: 32px; margin-right: 7px; margin-bottom: 0px;\"\u003eOpen-World Reinforcement Learning over Long Short-Term Imagination \u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n    Jiajian Li*\n    ·\n    Qi Wang*\n    ·\n    Yunbo Wang\n    ·\n    Xin Jin\n    ·\n    Yang Li\n    ·\n    Wenjun Zeng\n    ·\n    Xiaokang Yang\n  \u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e \u003ca href=\"https://openreview.net/pdf?id=vzItLaEoDa\" target=\"_blank\"\u003e Paper \u003c/a\u003e \u0026nbsp;\u0026nbsp; |\u0026nbsp;\u0026nbsp;   \u003ca href=\"https://arxiv.org/pdf/2410.03618\" target=\"_blank\"\u003e arXiv \u003c/a\u003e \u0026nbsp;\u0026nbsp; | \u0026nbsp;\u0026nbsp; \u003ca href=\"https://qiwang067.github.io/ls-imagine\" target=\"_blank\"\u003e Website \u003c/a\u003e \u0026nbsp;\u0026nbsp; \u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\u003c/div\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003e\u003cb\u003e⚡ Quick Start\u003c/b\u003e\u003c/a\u003e |\n  \u003ca href=\"#pretrained-weights\"\u003e\u003cb\u003e📥 Checkpoints Download\u003c/b\u003e\u003c/a\u003e |\n  \u003ca href=\"#citation\"\u003e\u003cb\u003e📝 Citation\u003c/b\u003e\u003c/a\u003e \u003cbr\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/overview.png\" alt=\"Teaser image\" /\u003e\n\u003c/p\u003e\n\n\u003cp style=\"text-align:justify\"\u003e\n  Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be \"short-sighted\", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imgine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a \u003ci\u003elong short-term world model\u003c/i\u003e. To achieve this, we simulate goal-conditioned jumpy state transitions and compute corresponding affordance maps by zooming in on specific areas within single images. This facilitates the integration of direct long-term values into behavior learning. Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/success_rate_with_barplot.png\" alt=\"evaluation_results\" width=\"90%\"/\u003e\n\u003c/p\u003e\n\n\u003c!-- # Open-World Reinforcement Learning over Long Short-Term Imagination\n#### Open-World Reinforcement Learning over Long Short-Term Imagination\n\nJiajian Li*, Qi Wang*, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang\n\n[[arXiv]](https://arxiv.org/pdf/2410.03618)  [[Project Page]](https://qiwang067.github.io/ls-imagine) --\u003e\n\n## Quick Start\n### Install the Environment\nLS-Imagine is implemented and tested on Ubuntu 20.04 with python==3.9:\n\n1. Create an environment\n    ```bash\n    conda create -n ls_imagine python=3.9\n    conda activate ls_imagine \n    ```\n\n2. Install Java: JDK `1.8.0_171`. Then install the [MineDojo](https://github.com/MineDojo/MineDojo) environment and [MineCLIP](https://github.com/MineDojo/MineCLIP) following their official documents. During the installation of MineDojo, various errors may occur.\n\n\u003e [!IMPORTANT]\n\u003e**We provide the detailed installation process and solutions to common errors, please refer to [here](./docs/minedojo_installation.md).**\n\n4. Install dependencies\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n5. Download the MineCLIP weight [here](https://drive.google.com/file/d/1uaZM1ZLBz2dZWcn85rZmjP7LV6Sg5PZW/view?usp=sharing) and place them at `./weights/mineclip_attn.pth`.\n\n6. We provide two options for recording data during the training process: TensorBoard and Weights \u0026 Biases (wandb). \n\n   - To use TensorBoard, set `use_wandb` to `False` in the `./config.yaml` file.\n   - To use wandb (optional), set `use_wandb` to `True` in the `./config.yaml` file. Additionally, retrieve your wandb API key and set it in the `./config.yaml` file under the field `wandb_key: {your_wandb_api_key}`.\n\n\n### Pretrained Weights\n\nWe provide pretrained weights of **LS-Imagine** for the tasks mentioned in the paper. You can download them using the links in the table below and rename the downloaded file to `latest.pt`:\n\n\u003cdiv align=\"center\"\u003e\n\n| Task Name                  | Weight File                                                                                   |\n|----------------------------|-----------------------------------------------------------------------------------------------|\n| harvest_log_in_plains      | [latest_log.pt](https://drive.google.com/file/d/1_mhz49YPJDMNmPB-WwbzCG6zCTUidFQ3/view?usp=drive_link)                                                                |\n| harvest_water_with_bucket  | [latest_water.pt](https://drive.google.com/file/d/1DxtQ-ZckTVw1tFySspKRaxoaShDi_b8A/view?usp=drive_link)                                                              |\n| harvest_sand               | [latest_sand.pt](https://drive.google.com/file/d/1xa6JwV7rh-IfGoFjWDoneWpaTwPxLWb_/view?usp=drive_link)                                                               |\n| mine_iron_ore              | [latest_iron.pt](https://drive.google.com/file/d/1FOIRGQJvgeQptK8-4cVVGv_jeNIy8kw7/view?usp=drive_link)                                                               |\n| shear_sheep                | [latest_wool.pt](https://drive.google.com/file/d/1sx7IVOZ1JYs0BJHD3TWZcPb-f-x5xfA3/view?usp=drive_link)                                                               |\n\n\u003c/div\u003e\n\n\u003ca name=\"evaluation_with_checkpoints\"\u003e\u003c/a\u003e\nTo start a evaluating run from one of these checkpoints:\n\n1. Set up the task for evaluation ([instructions here](./docs/task_setups.md)).\n\n2. Run the following command to test the success rate:\n    ```bash\n    sh ./scripts/test.sh /path/to/latest.pt 100 test_harvest_log_in_plains\n    ```\n\n\u003c!-- 5. Download the Multimodal U-Net weight [here](https://drive.google.com/file/d/1Ylhw-MkT1UIUX5EyOosNmF09bWSlEjSf/view?usp=sharing), rename it to `swin_unet_checkpoint.pth`, place it at `finetune_unet/finetune_checkpoints/harvest_wool_in_plains` --\u003e\n\n## Quick Links\n\n- [Training LS-Imagine in MineDojo](#training-ls-imagine-in-minedojo)\n  - [U-Net Finetuning for Affordance Map Generation](#u-net-finetuning-for-affordance-map-generation)\n  - [World Model and Behavior Learning](#world-model-and-behavior-learning)\n- [Success Rate Evaluation](#success-rate-evaluation)\n- [Citation](#citation)\n- [Credits](#credits)\n\n\u003ca name=\"lsimagine_train\"\u003e\u003c/a\u003e\n\n## Training LS-Imagine in MineDojo\nLS-Imagine mainly consists of two stages: [fine-tuning a multimodal U-Net for generating affordance maps](#unet_finetune), [learning world models and behaviors](#agent_learn). \n\nYou can either set up custom tasks in MineDojo ([instructions here](./docs/task_setups.md)) or use the task setups mentioned in our [paper](https://arxiv.org/pdf/2410.03618). LS-Imagine allows to start from any stage of the pipeline, as we provide corresponding checkpoint files for each stage to ensure flexibility.\n\n\u003ca name=\"unet_finetune\"\u003e\u003c/a\u003e\n### U-Net Finetuning for Affordance Map Generation\n\n1. Download the pretrained U-Net weights from [here](https://drive.google.com/file/d/1N2VTC458txxW5UABQDgmRTEYzeYTEjIX/view?usp=sharing) and save them to `./affordance_map/pretrained_unet_checkpoint/swin_unet_checkpoint.pth`.\n\n2. Set up the task ([instructions here](./docs/task_setups.md)) and run the following command to collect data:\n    ```bash\n    sh ./script/collect.sh your_task_name\n    ```\n\n3. Annotate the collected data using a method based on sliding bounding box scanning and simulated exploration to generate the fine-tuning dataset:\n    ```bash\n    sh ./scripts/affordance.sh your_task_name your_prompt\n    ```\n\n4. Fine-tune the pretrained U-Net weights using the annotated dataset to generate task-specific affordance maps:\n    ```bash\n    sh ./scripts/finetune_unet.sh your_task_name\n    ```\n\n5. After training, the fine-tuned multimodal U-Net weights for the specified task will be saved in `./affordance_map/model_out`.\n\n\u003ca name=\"agent_learn\"\u003e\u003c/a\u003e\n### World Model and Behavior Learning\n\nBefore starting the learning process for the world model and behavior, ensure you have obtained the multimodal U-Net weights. We provide the pretrained U-Net weights ([link here](https://drive.google.com/file/d/1N2VTC458txxW5UABQDgmRTEYzeYTEjIX/view?usp=sharing)) and the task-specific fine-tuned U-Net weights: \n\u003cdiv align=\"center\"\u003e\n\n| Task Name                  | Weight File                                                                                                                   |\n|----------------------------|-------------------------------------------------------------------------------------------------------------------------------|\n| harvest_log_in_plains      | [swin_unet_checkpoint_log.pth](https://drive.google.com/file/d/1UxLGThaI7_iJ0AR_rNZ4RSQwrSydFN40/view?usp=sharing)             |\n| harvest_water_with_bucket  | [swin_unet_checkpoint_water.pth](https://drive.google.com/file/d/1Z-7vDNOiKxFE0iaApjYznALkLu8F4cXD/view?usp=sharing)          |\n| harvest_sand               | [swin_unet_checkpoint_sand.pth](https://drive.google.com/file/d/1ZeKVY6Y99Nch_wDXOgl_WX1IEyuIFNrs/view?usp=sharing)          |\n| mine_iron_ore              | [swin_unet_checkpoint_iron.pth](https://drive.google.com/file/d/1_sUWXeVEFEYHpQmw0115pMFYJxKmMZyL/view?usp=sharing)          |\n| shear_sheep                | [swin_unet_checkpoint_wool.pth](https://drive.google.com/file/d/1uaZM1ZLBz2dZWcn85rZmjP7LV6Sg5PZW/view?usp=sharing)          |\n\n\u003c/div\u003e\n\nYou can download these weights using the links provided in the table below and place them at `./affordance_map/finetune_unet/finetune_checkpoints/{task_name}/swin_unet_checkpoint.pth`: \n\n1. Set up the task and correctly configure the `unet_checkpoint_dir` to ensure the U-Net weights are properly located and loaded ([instructions here](./docs/task_setups.md)).\n\n2. Run the following command to start training the world model and behavior:\n    ```bash\n    sh ./scripts/train.sh your_task_name\n    ```\n\n\u003ca name=\"evaluation\"\u003e\u003c/a\u003e\n## Success Rate Evaluation\n\nAfter completing the training, the agent's weight file `latest.pt` will be saved in the `./logdir` directory. You can evaluate the performance of LS-Imagine as mentioned in [here](#evaluation_with_checkpoints).\n\u003c!--\nAdditionally, we provide pretrained weights for the tasks mentioned in the paper. You can download them using the links in the table below and rename the downloaded file to `latest.pt`:\n\n\u003cdiv align=\"center\"\u003e\n\n| Task Name                  | Weight File                                                                                   |\n|----------------------------|-----------------------------------------------------------------------------------------------|\n| harvest_log_in_plains      | [latest_log.pt](https://drive.google.com/file/d/1_mhz49YPJDMNmPB-WwbzCG6zCTUidFQ3/view?usp=drive_link)                                                                |\n| harvest_water_with_bucket  | [latest_water.pt](https://drive.google.com/file/d/1DxtQ-ZckTVw1tFySspKRaxoaShDi_b8A/view?usp=drive_link)                                                              |\n| harvest_sand               | [latest_sand.pt](https://drive.google.com/file/d/1xa6JwV7rh-IfGoFjWDoneWpaTwPxLWb_/view?usp=drive_link)                                                               |\n| mine_iron_ore              | [latest_iron.pt](https://drive.google.com/file/d/1FOIRGQJvgeQptK8-4cVVGv_jeNIy8kw7/view?usp=drive_link)                                                               |\n| shear_sheep                | [latest_wool.pt](https://drive.google.com/file/d/1sx7IVOZ1JYs0BJHD3TWZcPb-f-x5xfA3/view?usp=drive_link)                                                               |\n\n\u003c/div\u003e\n\n\n1. Set up the task for evaluation ([instructions here](./docs/task_setups.md)).\n2. Retrieve your **Weights \u0026 Biases (wandb)** API key and set it in the `./config.yaml` file under the field `wandb_key: {your_wandb_api_key}`.\n3. Run the following command to test the success rate:\n    ```bash\n    MINEDOJO_HEADLESS=1 python test.py \\\n        --configs minedojo \\\n        --task minedojo_test_harvest_log_in_plains \\\n        --logdir ./logdir \\\n        --agent_checkpoint_dir {path_to_latest.pt} \\\n        --eval_episode_num 100\n    ```\n--\u003e\n\n## Citation\nIf you find this repo useful, please cite our paper:\n```bib\n@inproceedings{li2025open,\n    title={Open-World Reinforcement Learning over Long Short-Term Imagination}, \n    author={Jiajian Li and Qi Wang and Yunbo Wang and Xin Jin and Yang Li and Wenjun Zeng and Xiaokang Yang},\n    booktitle={ICLR},\n    year={2025}\n}\n```\n\n\n## Credits\nThe codes refer to the implemention of [dreamerv3-torch](https://github.com/NM512/dreamerv3-torch) and [Swin-Unet](https://github.com/HuCaoFighting/Swin-Unet). Thanks for the authors！\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqiwang067%2FLS-Imagine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqiwang067%2FLS-Imagine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqiwang067%2FLS-Imagine/lists"}