{"id":13603178,"url":"https://github.com/Zejun-Yang/AniPortrait","last_synced_at":"2025-04-11T14:30:33.816Z","repository":{"id":229779430,"uuid":"775873810","full_name":"Zejun-Yang/AniPortrait","owner":"Zejun-Yang","description":"AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation","archived":false,"fork":false,"pushed_at":"2024-07-02T02:22:52.000Z","size":56122,"stargazers_count":4923,"open_issues_count":83,"forks_count":610,"subscribers_count":63,"default_branch":"main","last_synced_at":"2025-04-10T03:50:15.835Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Zejun-Yang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-22T08:07:20.000Z","updated_at":"2025-04-10T03:15:32.000Z","dependencies_parsed_at":"2024-10-14T17:40:51.938Z","dependency_job_id":"6cf0a01d-fd4f-455c-931c-9d2d2bc28994","html_url":"https://github.com/Zejun-Yang/AniPortrait","commit_stats":null,"previous_names":["zejun-yang/aniportrait"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zejun-Yang%2FAniPortrait","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zejun-Yang%2FAniPortrait/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zejun-Yang%2FAniPortrait/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zejun-Yang%2FAniPortrait/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Zejun-Yang","download_url":"https://codeload.github.com/Zejun-Yang/AniPortrait/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248419647,"owners_count":21100213,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T18:01:55.603Z","updated_at":"2025-04-11T14:30:33.770Z","avatar_url":"https://github.com/Zejun-Yang.png","language":"Python","funding_links":[],"categories":["Python","\u003cspan id=\"avatar\"\u003eAvatar\u003c/span\u003e","Projects","📦 Legacy \u0026 Inactive Projects","Repos"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","🎥 Video"],"readme":"# AniPortrait\n\n**AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations**\n\nAuthor: Huawei Wei, Zejun Yang, Zhisheng Wang\n\nOrganization: Tencent Games Zhiji, Tencent\n\n![zhiji_logo](asset/zhiji_logo.png)\n\nHere we propose AniPortrait, a novel framework for generating high-quality animation driven by \naudio and a reference portrait image. You can also provide a video to achieve face reenacment.\n\n\u003ca href='https://arxiv.org/abs/2403.17694'\u003e\u003cimg src='https://img.shields.io/badge/Paper-Arxiv-red'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/ZJYang/AniPortrait/tree/main'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange'\u003e\u003c/a\u003e\n\u003ca href='https://huggingface.co/spaces/ZJYang/AniPortrait_official'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-green'\u003e\u003c/a\u003e\n\n## Pipeline\n\n![pipeline](asset/pipeline.png)\n\n## Updates / TODO List\n\n- ✅ [2024/03/27] Now our paper is available on arXiv.\n\n- ✅ [2024/03/27] Update the code to generate pose_temp.npy for head pose control.\n\n- ✅ [2024/04/02] Update a new pose retarget strategy for vid2vid. Now we support substantial pose difference between ref_image and source video.\n\n- ✅ [2024/04/03] We release our Gradio [demo](https://huggingface.co/spaces/ZJYang/AniPortrait_official) on HuggingFace Spaces (thanks to the HF team for their free GPU support)!\n\n- ✅ [2024/04/07] Update a frame interpolation module to accelerate the inference process. Now you can add -acc in inference commands to get a faster video generation.\n\n- ✅ [2024/04/21] We have released the audio2pose model and [pre-trained weight](https://huggingface.co/ZJYang/AniPortrait/tree/main) for audio2video. Please update the code and download the weight file to experience.\n\n## Various Generated Videos\n\n### Self driven\n\n\u003ctable class=\"center\"\u003e\n\u003ctr\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/82c0f0b0-9c7c-4aad-bf0e-27e6098ffbe1\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/51a502d9-1ce2-48d2-afbe-767a0b9b9166\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n### Face reenacment\n\n\u003ctable class=\"center\"\u003e\n\u003ctr\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/d4e0add6-20a2-4f4b-808c-530a6f4d3331\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/849fce22-0db1-4257-a75f-a5dc655e6b9e\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\nVideo Source: [鹿火CAVY from bilibili](https://www.bilibili.com/video/BV1H4421F7dE/?spm_id_from=333.337.search-card.all.click)\n\n### Audio driven\n\n\u003ctable class=\"center\"\u003e\n\u003ctr\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/63171e5a-e4c1-4383-8f20-9764524928d0\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/6fd74024-ba19-4f6b-b37a-10df5cf2c934\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/9e516cc5-bf09-4d45-b5e3-820030764982\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n    \u003ctd width=50% style=\"border: none\"\u003e\n        \u003cvideo controls autoplay loop src=\"https://github.com/Zejun-Yang/AniPortrait/assets/21038147/7c68148b-8022-453f-be9a-c69590038197\" muted=\"false\"\u003e\u003c/video\u003e\n    \u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## Installation\n\n### Build environment\n\nWe recommend a python version \u003e=3.10 and cuda version =11.7. Then build environment as follows:\n\n```shell\npip install -r requirements.txt\n```\n\n### Download weights\n\nAll the weights should be placed under the `./pretrained_weights` direcotry. You can download weights manually as follows:\n\n1. Download our trained [weights](https://huggingface.co/ZJYang/AniPortrait/tree/main), which include the following parts: `denoising_unet.pth`, `reference_unet.pth`, `pose_guider.pth`, `motion_module.pth`, `audio2mesh.pt`, `audio2pose.pt` and `film_net_fp16.pt`. You can also download from [wisemodel](https://wisemodel.cn/models/zjyang8510/AniPortrait).\n\n2. Download pretrained weight of based models and other components: \n    - [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)\n    - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)\n    - [image_encoder](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder)\n    - [wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)\n\nFinally, these weights should be orgnized as follows:\n\n```text\n./pretrained_weights/\n|-- image_encoder\n|   |-- config.json\n|   `-- pytorch_model.bin\n|-- sd-vae-ft-mse\n|   |-- config.json\n|   |-- diffusion_pytorch_model.bin\n|   `-- diffusion_pytorch_model.safetensors\n|-- stable-diffusion-v1-5\n|   |-- feature_extractor\n|   |   `-- preprocessor_config.json\n|   |-- model_index.json\n|   |-- unet\n|   |   |-- config.json\n|   |   `-- diffusion_pytorch_model.bin\n|   `-- v1-inference.yaml\n|-- wav2vec2-base-960h\n|   |-- config.json\n|   |-- feature_extractor_config.json\n|   |-- preprocessor_config.json\n|   |-- pytorch_model.bin\n|   |-- README.md\n|   |-- special_tokens_map.json\n|   |-- tokenizer_config.json\n|   `-- vocab.json\n|-- audio2mesh.pt\n|-- audio2pose.pt\n|-- denoising_unet.pth\n|-- film_net_fp16.pt\n|-- motion_module.pth\n|-- pose_guider.pth\n`-- reference_unet.pth\n```\n\nNote: If you have installed some of the pretrained models, such as `StableDiffusion V1.5`, you can specify their paths in the config file (e.g. `./config/prompts/animation.yaml`).\n\n\n## Gradio Web UI\n\nYou can try out our web demo by the following command. We alse provide online demo \u003ca href='https://huggingface.co/spaces/ZJYang/AniPortrait_official'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-green'\u003e\u003c/a\u003e in Huggingface Spaces.\n\n\n```shell\npython -m scripts.app\n```\n\n## Inference\n\nKindly note that you can set -L to the desired number of generating frames in the command, for example, `-L 300`.\n\n**Acceleration method**: If it takes long time to generate a video, you can download [film_net_fp16.pt](https://huggingface.co/ZJYang/AniPortrait/tree/main) and put it under the `./pretrained_weights` direcotry. Then add `-acc` in the command.\n\nHere are the cli commands for running inference scripts:\n\n### Self driven\n\n```shell\npython -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -acc\n```\n\nYou can refer the format of animation.yaml to add your own reference images or pose videos. To convert the raw video into a pose video (keypoint sequence), you can run with the following command:\n\n```shell\npython -m scripts.vid2pose --video_path pose_video_path.mp4\n```\n\n### Face reenacment\n\n```shell\npython -m scripts.vid2vid --config ./configs/prompts/animation_facereenac.yaml -W 512 -H 512 -acc\n```\n\nAdd source face videos and reference images in the animation_facereenac.yaml.\n\n### Audio driven\n\n```shell\npython -m scripts.audio2vid --config ./configs/prompts/animation_audio.yaml -W 512 -H 512 -acc\n```\n\nAdd audios and reference images in the animation_audio.yaml.\n\nDelete `pose_temp` in `./configs/prompts/animation_audio.yaml` can enable the audio2pose model.\n\nYou can also use this command to generate a pose_temp.npy for head pose control:\n\n```shell\npython -m scripts.generate_ref_pose --ref_video ./configs/inference/head_pose_temp/pose_ref_video.mp4 --save_path ./configs/inference/head_pose_temp/pose.npy\n```\n\n## Training\n\n### Data preparation\nDownload [VFHQ](https://liangbinxie.github.io/projects/vfhq/) and [CelebV-HQ](https://github.com/CelebV-HQ/CelebV-HQ) \n\nExtract keypoints from raw videos and write training json file (here is an example of processing VFHQ): \n\n```shell\npython -m scripts.preprocess_dataset --input_dir VFHQ_PATH --output_dir SAVE_PATH --training_json JSON_PATH\n```\n\nUpdate lines in the training config file: \n\n```yaml\ndata:\n  json_path: JSON_PATH\n```\n\n### Stage1\n\nRun command:\n\n```shell\naccelerate launch train_stage_1.py --config ./configs/train/stage1.yaml\n```\n\n### Stage2\n\nPut the pretrained motion module weights `mm_sd_v15_v2.ckpt` ([download link](https://huggingface.co/guoyww/animatediff/blob/main/mm_sd_v15_v2.ckpt)) under `./pretrained_weights`. \n\nSpecify the stage1 training weights in the config file `stage2.yaml`, for example:\n\n```yaml\nstage1_ckpt_dir: './exp_output/stage1'\nstage1_ckpt_step: 30000 \n```\n\nRun command:\n\n```shell\naccelerate launch train_stage_2.py --config ./configs/train/stage2.yaml\n```\n\n## Acknowledgements\n\nWe first thank the authors of [EMO](https://github.com/HumanAIGC/EMO), and part of the images and audios in our demos are from EMO. Additionally, we would like to thank the contributors to the [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), [majic-animate](https://github.com/magic-research/magic-animate), [animatediff](https://github.com/guoyww/AnimateDiff) and [Open-AnimateAnyone](https://github.com/guoqincode/Open-AnimateAnyone) repositories, for their open research and exploration.\n\n## Citation\n\n```\n@misc{wei2024aniportrait,\n      title={AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations}, \n      author={Huawei Wei and Zejun Yang and Zhisheng Wang},\n      year={2024},\n      eprint={2403.17694},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZejun-Yang%2FAniPortrait","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FZejun-Yang%2FAniPortrait","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZejun-Yang%2FAniPortrait/lists"}