{"id":20599637,"url":"https://github.com/adamdad/vico","last_synced_at":"2025-08-20T16:33:07.473Z","repository":{"id":247514262,"uuid":"812686396","full_name":"Adamdad/vico","owner":"Adamdad","description":"Vico: Compositional Video Generation as Flow Equalization","archived":false,"fork":false,"pushed_at":"2024-11-15T23:13:17.000Z","size":9148,"stargazers_count":54,"open_issues_count":5,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-11T13:23:36.825Z","etag":null,"topics":["aigc","diffusion-models","video"],"latest_commit_sha":null,"homepage":"https://adamdad.github.io/vico/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Adamdad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-09T15:42:28.000Z","updated_at":"2024-11-27T17:09:52.000Z","dependencies_parsed_at":"2024-07-09T06:26:50.220Z","dependency_job_id":"832c3e5c-aee3-4bce-9f1b-a90407915753","html_url":"https://github.com/Adamdad/vico","commit_stats":null,"previous_names":["adamdad/vico"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adamdad%2Fvico","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adamdad%2Fvico/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adamdad%2Fvico/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adamdad%2Fvico/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Adamdad","download_url":"https://codeload.github.com/Adamdad/vico/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230438191,"owners_count":18225871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aigc","diffusion-models","video"],"created_at":"2024-11-16T08:33:38.502Z","updated_at":"2024-12-19T13:08:51.414Z","avatar_url":"https://github.com/Adamdad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🌊 Vico: Compositional Video Generation as Flow Equalization 🌊\n\n\u003cp style=\"font-family: 'Indie Flower', cursive;\" align=\"center\"\u003e\nAll roads lead to Rome!\n\u003c/p\u003e\n\nThis reposioty contains our official implementation for **Vico**. Vico provides a unified solution for compositional video generation by equalizing the information flow of text tokens.\n\n**Compositional Video Generation as Flow Equalization**\n\n🥯[[Project Page](https://adamdad.github.io/vico/)] 📝[[Paper](https://arxiv.org/abs/2407.06182)] \u003c/\u003e[[code](https://github.com/Adamdad/vico)]\n\nXingyi Yang, Xinchao Wang\n\nNational University of Singapore\n\n\n![pipeline](https://adamdad.github.io/vico/static/images/teaser.jpg)\n\n\n\u003e We introduce Vico, a generic framework for compositional video generation that explicitly ensures all concepts are represented properly. At its core, Vico analyzes how input tokens influence the generated video, and adjusts the model to prevent any single concept from dominating. We apply our method to multiple diffusion-based video models for compositional T2V and video editing. Empirical results demonstrate that our framework significantly enhances the compositional richness and accuracy of the generated videos.\n\n# Results\n| Prompt | Baseline | +Vico |\n| --- |  --- |  --- | \n| A **crab** **DJing** at a **beach** party during sunset. |![crab_base](static/crab_base.gif) |![crab_flow](static/crab_flow.gif) |\n| A **falcon** as a **messenger** in a sprawling **medieval city**. | ![fac_base](static/fac_base.gif)| ![fac_flow](static/fac_flow.gif) |\n| A confused **panda** in **calculus class**. | ![](static/panda_base.gif)|![](static/panda_flow.gif) |\n\n# Installation\n- Enviroments\n    ```shell\n    pip install diffusers==0.26.3\n    ```\n\n- For VideoCrafterv2, it is recommanded to download the `diffusers` checkpoints first on (`adamdad/videocrafterv2_diffusers`)[https://huggingface.co/adamdad/videocrafterv2_diffusers]. I do it by convering the official checkpoint to the diffuser format.\n    ```shell\n    git lfs install\n    git clone https://huggingface.co/adamdad/videocrafterv2_diffusers\n    ```\n\n# Usage\n```shell\nexport PYTHONPATH=\"$PWD\"\npython videocrafterv2_vico.py \\\n    --prompts XXX \\\n    --unet_path $PATH_TO_VIDEOCRAFTERV2 \\\n    --attribution_mode \"latent_attention_flow_st_soft\" \n```\n\n# 📝 Changelog \n- **[2024.07.09]**: Release arxiv paper and code for Vico on Videocrafterv2.\n\n## Acknowledgement\n\nWe are mostly inspired by [Attend\u0026Excite](https://github.com/yuval-alaluf/Attend-and-Excite) for text-to-image generation. \nWe thank the valuable disscussion with [@Yuanshi9815](https://github.com/Yuanshi9815).\n\n## Citation\n\n```bibtex\n@misc{yang2024compositional,\n    title={Compositional Video Generation as Flow Equalization},\n    author={Xingyi Yang and Xinchao Wang},\n    year={2024},\n    eprint={2407.06182},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadamdad%2Fvico","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadamdad%2Fvico","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadamdad%2Fvico/lists"}