{"id":42671718,"url":"https://github.com/UCSC-VLAA/story-iter","last_synced_at":"2026-02-08T23:01:02.507Z","repository":{"id":252516400,"uuid":"840665645","full_name":"UCSC-VLAA/story-iter","owner":"UCSC-VLAA","description":"[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization","archived":false,"fork":false,"pushed_at":"2026-02-06T19:46:41.000Z","size":296756,"stargazers_count":947,"open_issues_count":0,"forks_count":131,"subscribers_count":13,"default_branch":"main","last_synced_at":"2026-02-07T05:45:03.087Z","etag":null,"topics":["diffusion-models","generative-art","generative-model","image-generation","storytelling","visual-storytelling"],"latest_commit_sha":null,"homepage":"https://jwmao1.github.io/storyiter/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UCSC-VLAA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-08-10T10:05:23.000Z","updated_at":"2026-02-06T19:46:45.000Z","dependencies_parsed_at":"2024-08-10T11:49:08.483Z","dependency_job_id":"296c569c-5896-47a4-b072-118d8d5021b3","html_url":"https://github.com/UCSC-VLAA/story-iter","commit_stats":null,"previous_names":["evilemogod/story-adapter","jwmao1/story-adapter","ucsc-vlaa/story-adapter","ucsc-vlaa/story-iter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/UCSC-VLAA/story-iter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCSC-VLAA%2Fstory-iter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCSC-VLAA%2Fstory-iter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCSC-VLAA%2Fstory-iter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCSC-VLAA%2Fstory-iter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UCSC-VLAA","download_url":"https://codeload.github.com/UCSC-VLAA/story-iter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCSC-VLAA%2Fstory-iter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29248487,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-08T22:49:53.206Z","status":"ssl_error","status_checked_at":"2026-02-08T22:49:51.384Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","generative-art","generative-model","image-generation","storytelling","visual-storytelling"],"created_at":"2026-01-29T11:00:33.986Z","updated_at":"2026-02-08T23:01:02.498Z","avatar_url":"https://github.com/UCSC-VLAA.png","language":"Python","funding_links":[],"categories":["Personalized Restoration"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./docs/logo.png\" height=150\u003e\n\u003c/p\u003e\n\n\n\n# Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization\n\u003cspan\u003e\n\u003ca href=\"https://arxiv.org/abs/2410.06244\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2410.06244-b31b1b.svg\" height=22.5\u003e\u003c/a\u003e\n\u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" height=22.5\u003e\u003c/a\u003e  \n\u003ca href=\"https://jwmao1.github.io/storyiter/\"\u003e\u003cimg src=\"https://img.shields.io/badge/project-StoryIter-purple.svg\" height=22.5\u003e\u003c/a\u003e\n\u003ca href=\"https://colab.research.google.com/drive/1sFbw0XlCQ6DBRU3s2n_F2swtNmHoicM-?usp=sharing\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=22.5\u003e\u003c/a\u003e\n\u003c/span\u003e\n\nCode for the paper [Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization](https://arxiv.org/abs/2410.06244)\n\nNote: This code base is still not complete. \n\n### About this repo:\n\nThe repository contains the official implementation of \"Story-Iter\".\n\n## Introduction 🦖\n\n\u003e Story visualization, the task of generating coherent images based on a narrative, has seen significant advancements with the emergence of text-to-image models, particularly diffusion models. However, maintaining semantic consistency, generating high-quality fine-grained interactions, and ensuring computational feasibility remain challenging, especially in long story visualization (_i.e._, up to 100 frames). In this work, we introduce **Story-Iter**, a new training-free iterative paradigm to enhance long-story generation. Unlike existing methods that rely on fixed reference images to construct a complete story, our approach features a novel external iterative paradigm, extending beyond the internal iterative denoising steps of diffusion models, to continuously refine each generated image by incorporating all reference images from the previous round. To achieve this, we propose a plug-and-play, training-free **g**lobal **r**eference **c**ross-**a**ttention (**GRCA**) module, modeling all reference frames with global embeddings, ensuring semantic consistency in long sequences. By progressively incorporating holistic visual context and text constraints, our iterative paradigm enables precise generation with fine-grained interactions, optimizing the story visualization step-by-step. Extensive experiments in the official story visualization dataset and our long story benchmark demonstrate that Story-Iter's state-of-the-art performance in long-story visualization (up to 100 frames) excels in both semantic consistency and fine-grained interactions.\n\n\u003cbr\u003e\n\n\u003cimg src=\"./docs/teaser-github.jpg\" width=\"800\"/\u003e\n\n\n## News 🚀\n* **2024.10.10**: [Paper](https://arxiv.org/abs/2410.06244) is released on ArXiv.\n* **2024.10.04**: Code released.\n* **2026.01.27**: Fast version released, visualizing a 100-frame story over 10 iterations takes only 20 minutes.\n* **2026.01.27**: ControlNet version released, supporting openPose skeletons as control signals.\n\n## Framework 🤖 \n\n\u003e Story-Iter framework. Illustration of the proposed iterative paradigm, which consists of initialization, iterations in Story-Iter, and implementation of Global Reference Cross-Attention (GRCA).\nStory-Iter first visualizes each image only based on the text prompt of the story and uses all results as reference images for the future round. \nIn the iterative paradigm, Story-Iter inserts GRCA into SD. For the ith iteration of each image visualization, GRCA will aggregate the information flow of all reference images during the denoising process through cross-attention.\nAll results from this iteration will be used as a reference image to guide the dynamic update of the story visualization in the next iteration.\n\n\u003cbr\u003e\n\n\u003cimg src=\"./docs/framework.jpg\" width=\"1080\"/\u003e\n\n\n## Quick Start 🔧\n\n### Installation\nThe project is built with Python 3.10.14, PyTorch 2.2.2. CUDA 12.1, cuDNN 8.9.02\nFor installing, follow these instructions:\n~~~\n# git clone this repository\ngit clone https://github.com/UCSC-VLAA/Story-Iter.git\ncd Story-Iter\n\n# create new anaconda env\nconda create -n StoryAdapter python=3.10\nconda activate StoryAdapter \n\n# install packages\npip install -r requirements.txt\n~~~\n\n### Download the checkpoint\n- downloading [RealVisXL_V4.0](https://huggingface.co/SG161222/RealVisXL_V4.0/tree/main) put it into \"./RealVisXL_V4.0\"\n- downloading [clip_image_encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder) put it into \"./IP-Adapter/sdxl_models/image_encoder\"\n- downloading [ip-adapter_sdxl](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/ip-adapter_sdxl.bin?download=true) put it into \"./IP-Adapter/sdxl_models/ip-adapter_sdxl.bin\"\n\n### Running Demo\n\n~~~\npython run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin \n~~~\n\n### Customized Running\n\n~~~\npython run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin \n--story \"your prompt1\" \"your prompt2\" \"your prompt3\" ... \"your promptN\"\n~~~\nNote: Regarding custom stories, we suggest the template [Character Definition + Interaction Definition + Scene Definition] for better story visualization performance. For example, the Character Definition is \"One man wearing yellow robe,\" the Interaction Definition is \"dancing,\" and the Scene Definition is \"the palace hall.\" So, the input prompt is \"One man wearing yellow robe dancing in the palace hall.\"\n\n## Performance 🎨\n\n### Regular-length Story Visualization \n- downloading the [StorySalon](https://huggingface.co/datasets/haoningwu/StorySalon/resolve/main/testset.zip?download=true) test set.\"\n\n| GIF1 | GIF2  | GIF3  |\n| --- | --- | --- |\n| \u003cimg src=\"./docs/our_005169.gif\" alt=\"GIF 1\" width=\"224\"/\u003e  | \u003cimg src=\"./docs/our_007016.gif\" alt=\"GIF 2\" width=\"224\"/\u003e | \u003cimg src=\"./docs/our_007137.gif\" alt=\"GIF 3\" width=\"224\"/\u003e  |\n\n| GIF4 | GIF5  | GIF6  |\n| --- | --- | --- |\n| \u003cimg src=\"./docs/our_013804.gif\" alt=\"GIF 4\" width=\"224\"/\u003e  | \u003cimg src=\"./docs/our_015770.gif\" alt=\"GIF 5\" width=\"224\"/\u003e | \u003cimg src=\"./docs/our_000026.gif\" alt=\"GIF 6\" width=\"224\"/\u003e  |\n\n| GIF7 | GIF8  | GIF9  |\n| --- | --- | --- |\n| \u003cimg src=\"./docs/our_012060.gif\" alt=\"GIF 7\" width=\"224\"/\u003e  | \u003cimg src=\"./docs/our_008614.gif\" alt=\"GIF 8\" width=\"224\"/\u003e | \u003cimg src=\"./docs/our_008710.gif\" alt=\"GIF 9\" width=\"224\"/\u003e  |\n\n\n### Long Story Visualization \n\n\u003cbr\u003e\n\n\u003cimg src=\"./docs/comic1.png\" width=\"1080\"/\u003e\n\n\u003cbr\u003e\n\u003cimg src=\"./docs/comic7.png\" width=\"1080\"/\u003e\n\n\u003cbr\u003e\n\u003cimg src=\"./docs/comic3.png\" width=\"1080\"/\u003e\n\n### Running with Different Style\ncomic style:\n~~~\npython run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin --style comic\n~~~\n\n\u003cimg src=\"./docs/style_comic.png\" width=\"1080\"/\u003e\n\n\u003cbe\u003e\n\nfilm style:\n~~~\npython run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin --style film\n~~~\n\u003cimg src=\"./docs/style_film.png\" width=\"1080\"/\u003e\n\n\u003cbe\u003e\n\nrealistic style:\n~~~\npython run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin --style realistic\n~~~\n\n\u003cimg src=\"./docs/style_realistic.png\" width=\"1080\"/\u003e\n\n### Fast running with LCM\n~~~\npython run_fast.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin \n~~~\n\n\u003cimg src=\"./docs/story_fast.jpg\" width=\"1080\"/\u003e\n\n### Running with ControlNet\n~~~\npython run_controlnet.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path/IP-Adapter/sdxl_models/ip-adapter_sdxl.bin --openpose_path your_path/openpose_root\n~~~\n\n\u003cimg src=\"./docs/controlnet.jpg\" width=\"1080\"/\u003e\n\n## Acknowledgement 🍻\n\nDeeply appreciate these wonderful open source projects: [stablediffusion](https://github.com/Stability-AI/StableDiffusion), [clip](https://github.com/openai/CLIP), [ip-adapter](https://github.com/tencent-ailab/IP-Adapter), [storygen](https://github.com/haoningwu3639/StoryGen), [storydiffusion](https://github.com/HVision-NKU/StoryDiffusion), [theatergen](https://github.com/donahowe/TheaterGen), [timm](https://github.com/huggingface/pytorch-image-models).\n\n## Citation 🔖\n\nIf you find this repository useful, please consider giving a star ⭐ and citation 🙈:\n\n```\n@misc{mao2024story_adapter,\n  title={{Story-Adapter: A Training-free Iterative Framework for Long Story Visualization}},\n  author={Mao, Jiawei and Huang, Xiaoke and Xie, Yunfei and Chang, Yuanqi and Hui, Mude and Xu, Bingjie and Zhou, Yuyin},\n  journal={arXiv},\n  volume={abs/2410.06244},\n  year={2024},\n}\n```\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUCSC-VLAA%2Fstory-iter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FUCSC-VLAA%2Fstory-iter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUCSC-VLAA%2Fstory-iter/lists"}