{"id":13488910,"url":"https://github.com/Vchitect/VEnhancer","last_synced_at":"2025-03-28T02:31:24.769Z","repository":{"id":247797130,"uuid":"826847986","full_name":"Vchitect/VEnhancer","owner":"Vchitect","description":"Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation","archived":false,"fork":false,"pushed_at":"2024-09-16T13:48:02.000Z","size":32513,"stargazers_count":515,"open_issues_count":21,"forks_count":28,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-03-25T11:09:53.246Z","etag":null,"topics":["aigc-enhancement","diffusion-models","frame-interpolation","text-to-video","video-enhancement","video-generation","video-super-resolution","video-to-video"],"latest_commit_sha":null,"homepage":"https://vchitect.github.io/VEnhancer-project/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Vchitect.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-10T13:52:26.000Z","updated_at":"2025-03-24T08:52:08.000Z","dependencies_parsed_at":"2024-10-31T01:31:02.707Z","dependency_job_id":"1ac84295-f6a0-4383-964e-2b2ad2ebc5e2","html_url":"https://github.com/Vchitect/VEnhancer","commit_stats":null,"previous_names":["vchitect/venhancer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FVEnhancer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FVEnhancer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FVEnhancer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FVEnhancer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Vchitect","download_url":"https://codeload.github.com/Vchitect/VEnhancer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245957683,"owners_count":20700316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aigc-enhancement","diffusion-models","frame-interpolation","text-to-video","video-enhancement","video-generation","video-super-resolution","video-to-video"],"created_at":"2024-07-31T18:01:24.010Z","updated_at":"2025-03-28T02:31:21.040Z","avatar_url":"https://github.com/Vchitect.png","language":"Python","funding_links":[],"categories":["Video Generation","Python"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eVEnhancer: Generative Space-Time Enhancement\u003cbr\u003efor Video Generation\u003c/h1\u003e\n\n\u003cdiv\u003e\n    \u003ca href='https://scholar.google.com/citations?user=GUxrycUAAAAJ\u0026hl=zh-CN' target='_blank'\u003eJingwen He\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://tianfan.info' target='_blank'\u003eTianfan Xue\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://github.com/ChrisLiu6' target='_blank'\u003eDongyang Liu\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://github.com/0x3f3f3f3fun' target='_blank'\u003eXinqi Lin\u003c/a\u003e,\u0026emsp;\n\u003c/div\u003e\n    \u003ca href='https://gaopengcuhk.github.io' target='_blank'\u003ePeng Gao\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?user=GMzzRRUAAAAJ\u0026hl=en' target='_blank'\u003eDahua Lin\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?user=gFtI-8QAAAAJ\u0026hl=en' target='_blank'\u003eYu Qiao\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://wlouyang.github.io' target='_blank'\u003eWanli Ouyang\u003c/a\u003e,\u0026emsp;\n    \u003ca href='https://liuziwei7.github.io' target='_blank'\u003eZiwei Liu\u003c/a\u003e\n\u003cdiv\u003e\n\u003c/div\u003e\n\u003cdiv\u003e\n    The Chinese University of Hong Kong,\u0026emsp;Shanghai Artificial Intelligence Laboratory,\u0026emsp;\n\u003c/div\u003e\n\u003cdiv\u003e\n\n\u003c/div\u003e\n\u003cdiv\u003e\n S-Lab, Nanyang Technological University\u0026emsp;\n\u003c/div\u003e\n\n\u003cdiv\u003e\n    \u003ch4 align=\"center\"\u003e\n        \u003ca href=\"https://vchitect.github.io/VEnhancer-project/\" target='_blank'\u003e\n        \u003cimg src=\"https://img.shields.io/badge/🐳-Project%20Page-blue\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://arxiv.org/abs/2407.07667\" target='_blank'\u003e\n        \u003cimg src=\"https://img.shields.io/badge/arXiv-2312.06640-b31b1b.svg\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://youtu.be/QMR_5weifGg\" target='_blank'\u003e\n        \u003cimg src=\"https://img.shields.io/badge/Demo%20Video-%23FF0000.svg?logo=YouTube\u0026logoColor=white\"\u003e\n        \u003c!-- \u003c/a\u003e\n        \u003cimg src=\"https://api.infinitescript.com/badgen/count?name=\"\u003e --\u003e\n    \u003c/h4\u003e\n\u003c/div\u003e\n\n\u003cstrong\u003eVEnhancer, an All-in-One generative video enhancement model that can achieve spatial super-resolution, temporal super-resolution, and video refinement for AI-generated videos.\u003c/strong\u003e\n\n\u003ctable class=\"center\"\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"1\"\u003eAIGC video\u003c/td\u003e\n    \u003ctd colspan=\"1\"\u003e+VEnhancer\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/input_fish.gif width=\"380\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/out_fish.gif width=\"380\"\u003e\n  \u003c/td\u003e\n  \u003c/tr\u003e\n\n\n\n\u003c/table\u003e\n\n:open_book: For more visual results, go checkout our \u003ca href=\"https://vchitect.github.io/VEnhancer-project/\" target=\"_blank\"\u003eproject page\u003c/a\u003e\n\n\n---\n\n\u003c/div\u003e\n\n\n## 🔥 Update\n- [2024.09.12] 😸 Release our version 2 checkpoint: **[venhancer_v2.pt](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_v2.pt)** . It is less creative, but is able to generate more texture details, and has better identity preservation, which is more suitable for enhancing videos with profiles.\n- [2024.09.10] 😸 Support **Multiple GPU Inference** and **tiled VAE** for temporal VAE decoding. And more stable performance for long video enhancement.\n- [2024.08.18] 😸 Support enhancement for **abitrary long videos** (by spliting the videos into muliple chunks with overlaps); **Faster sampling** with only 15 steps without obvious quality loss (by setting `--solver_mode 'fast'` in the script command); Use **temporal VAE** to reduce video flickering.\n- [2024.07.28] 🔥 Inference code and pretrained video enhancement model are released.\n- [2024.07.10] 🤗 This repo is created.\n\n\u003c!-- ## Open Source Plan\n\n- [x] Release code of Multiple GPU Inference.\n- [x] Release code of tiled VAE.\n- [ ] Release model that is optimized for better idenity preservation. --\u003e\n\n\u003c!-- :star::star::star: Star us :star::star::star:! And we will speed up the open-sourcing process :heart:. --\u003e\n\n## :astonished: Gallery\n\n\n| Inputs \u0026 Results | Model Version |\n| :---------- | :-: |\n|Prompt: A close-up shot of a woman standing in a dimly lit room. she is wearing a traditional chinese outfit, which includes a red and gold dress with intricate designs and a matching headpiece.\u003cbr/\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/4a514853-65f6-40b8-8b5d-d14835bb9297\" width=\"100%\" controls autoplay\u003e\u003c/video\u003efrom [Open-Sora](https://github.com/hpcaitech/Open-Sora)|\u003cdiv style=\"width:100px\"\u003ev2\u003c/div\u003e|\n|Prompt: Einstein plays guitar.\u003cbr/\u003e\u003ctable class=\"center\"\u003e\u003ctr\u003e\u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/aa76e8a2-14e2-49a1-915c-147838476ab1\" width=\"50%\" controls autoplay\u003e\u003c/video\u003e\u003c/td\u003e\u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/f08e6f77-19d4-4847-9356-739a84da38b2\" width=\"50%\" controls autoplay\u003e\u003c/video\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003efrom [Kling](https://kling.kuaishou.com/en)|\u003cdiv style=\"width:100px\"\u003ev2\u003c/div\u003e|\n|Prompt: A girl eating noodles.\u003cbr/\u003e\u003ctable class=\"center\"\u003e\u003ctr\u003e\u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/cc01bf80-8b49-4314-97a3-1e1ec2d16d6a\" width=\"50%\" controls autoplay\u003e\u003c/video\u003e\u003c/td\u003e\u003ctd\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/ce923609-614b-4f87-ba2b-7b831edce40f\" width=\"50%\" controls autoplay\u003e\u003c/video\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003efrom [Kling](https://kling.kuaishou.com/en)| \u003cdiv style=\"width:100px\"\u003ev2\u003c/div\u003e|\n|Prompt: A little brick man visiting an art gallery.\u003cbr/\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/39a39459-4a69-4ef7-80ef-74df066decb5\" width=\"100%\" controls autoplay\u003e\u003c/video\u003e\u003cbr/\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/d110bec4-9ea1-4348-a6db-e9dd6cce4bc2\" width=\"100%\" controls autoplay\u003e\u003c/video\u003efrom [Kling](https://kling.kuaishou.com/en) | \u003cdiv style=\"width:100px\"\u003ev1\u003c/div\u003e|\n\u003c!-- |Prompt: A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea.\u003cbr/\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/d6ba4ebe-a970-4db1-ade1-03bfa8e52a20\" width=\"100%\" controls autoplay\u003e\u003c/video\u003e\u003cvideo src=\"https://github.com/user-attachments/assets/bf97116e-2fbc-4e29-b559-4fe08dc65c02\" width=\"100%\" controls autoplay\u003e\u003c/video\u003efrom [CogVideoX](https://github.com/THUDM/CogVideo)|\u003cdiv style=\"width:100px\"\u003ev2\u003c/div\u003e| --\u003e\n\n## 🎬 Overview\nVEnhancer achieves spatial super-resolution, temporal super-resolution (i.e, frame interpolation), and video refinement in **one model**.\nIt is flexible to adapt to different upsampling factors (e.g., 1x~8x) for either spatial or temporal super-resolution. Besides, it provides flexible control to modify the refinement strength for handling diversified video artifacts.\n\nIt follows ControlNet and copies the architecures and weights of multi-frame encoder and middle block of a pretrained video diffusion model to build a trainable condition network.\nThis **video ControlNet** accepts both low-resolution key frames and full frames of noisy latents as inputs.\nAlso, the noise level $\\sigma$ regarding noise augmentation and downscaling factor $s$ serve as additional network conditioning through our proposed **video-aware conditioning** apart from timestep $t$ and prompt $c_{text}$.\n\u003c!-- ![overall_structure](assets/venhancer_arch.png) --\u003e\n\n\n## :gear: Installation\n```shell\n# clone this repo\ngit clone https://github.com/Vchitect/VEnhancer.git\ncd VEnhancer\n\n# create environment\nconda create -n venhancer python=3.10\nconda activate venhancer\npip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2\npip install -r requirements.txt\n```\nNote that ffmpeg command should be enabled. If you have sudo access, then you can install it using the following command:\n```shell\nsudo apt-get update \u0026\u0026 apt-get install ffmpeg libsm6 libxext6  -y\n```\n\n## :dna: Pretrained Models\n| Model Name | Description | HuggingFace | BaiduNetdisk  |\n| :---------: | :----------: | :----------: | :----------: |\n| venhancer_paper.pth  | very creative, strong refinement, but sometimes over-smooths edges and texture details. | [download](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_paper.pt?download=true) | [download](https://pan.baidu.com/s/15t20RGvEHqJOMmhA_zRLiA?pwd=cpsd)|\n| venhancer_v2.pth  | less creative, but can generate better texture details, and has better identity preservation. | [download](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_v2.pt?download=true) | [download](https://pan.baidu.com/s/1mc4s5xqcVqKyL-GwkE0loA?pwd=bbqn)|\n\n## 💫 Inference\n1) Download the VEnhancer model and then put the checkpoint in the `VEnhancer/ckpts` directory. (optional as it can be done automatically)\n2) run the following command.\n```bash\n  bash run_VEnhancer.sh\n```\nfor single GPU inference (at least A100 80G is required), or\n```bash\n  bash run_VEnhancer_MultiGPU.sh\n```\nfor multiple GPU inference.\n\nIn `run_VEnhancer.sh` or `run_VEnhancer_MultiGPU.sh`,\n- `version`. We now provide two choices: `v1` and `v2` (venhancer_paper.pth and venhancer_v2.pth, respectively).\n- `up_scale` is the upsampling factor ($1\\sim8$) for spatial super-resolution. $\\times3,4$ are recommended. Note that the target resolution will be adjusted no higher than 2k resolution.\n- `target_fps` is your expected target fps, and the default is 24.\n- `noise_aug` is the noise level ($0\\sim300$) regarding noise augmentation. Higher noise corresponds to stronger refinement. $200\\sim300$ are recommended.\n- Regarding prompt, you can use `--filename_as_prompt` to automatically use filename as prompt; or you can write the prompt to a txt file, and specify the prompt_path by setting `--prompt_path [your_prompt_path]`; or directly provide the prompt by specifying `--prompt [your_prompt]`.\n- Regarding sampling, `--solver_mode fast` has fixed 15 sampling steps. For `--solver_mode normal`, you can modify `steps` to trade off efficiency over video quality.\n\n### Gradio\nThe same functionality is also available as a gradio demo. Please follow the previous guidelines, and specify the model version (v1 or v2).\n``` shell\npython gradio_app.py --version v1\n```\n\n\n## BibTeX\nIf you use our work in your research, please cite our publication:\n```\n@article{he2024venhancer,\n  title={VEnhancer: Generative Space-Time Enhancement for Video Generation},\n  author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},\n  journal={arXiv preprint arXiv:2407.07667},\n  year={2024}\n}\n```\n\n## 🤗 Acknowledgements\nOur codebase builds on [modelscope](https://github.com/modelscope/modelscope).\nThanks the authors for sharing their awesome codebases!\n\n## 📧 Contact\nIf you have any questions, please feel free to reach us at `hejingwenhejingwen@outlook.com`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FVchitect%2FVEnhancer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FVchitect%2FVEnhancer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FVchitect%2FVEnhancer/lists"}