{"id":13744572,"url":"https://github.com/TimoBolkart/voca","last_synced_at":"2025-05-09T03:32:41.253Z","repository":{"id":39459170,"uuid":"182107715","full_name":"TimoBolkart/voca","owner":"TimoBolkart","description":"This codebase demonstrates how to synthesize realistic 3D character animations given an arbitrary speech signal and a static character mesh.","archived":false,"fork":false,"pushed_at":"2024-08-20T10:53:36.000Z","size":15422,"stargazers_count":1134,"open_issues_count":41,"forks_count":274,"subscribers_count":41,"default_branch":"master","last_synced_at":"2024-08-20T23:06:50.412Z","etag":null,"topics":["3d-face","3d-models","animation-sequence","computer-graphics","computer-vision","face-animation","machine-learning","morphable-model","python","python3","tensorflow","voca"],"latest_commit_sha":null,"homepage":"https://voca.is.tue.mpg.de/en","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TimoBolkart.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-18T14:56:58.000Z","updated_at":"2024-08-20T10:53:40.000Z","dependencies_parsed_at":"2023-01-21T23:01:35.169Z","dependency_job_id":"67bf0839-e697-469e-97c1-1c64b9b68ea6","html_url":"https://github.com/TimoBolkart/voca","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimoBolkart%2Fvoca","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimoBolkart%2Fvoca/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimoBolkart%2Fvoca/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimoBolkart%2Fvoca/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TimoBolkart","download_url":"https://codeload.github.com/TimoBolkart/voca/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224811059,"owners_count":17373910,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-face","3d-models","animation-sequence","computer-graphics","computer-vision","face-animation","machine-learning","morphable-model","python","python3","tensorflow","voca"],"created_at":"2024-08-03T05:01:12.039Z","updated_at":"2024-11-15T16:31:19.388Z","avatar_url":"https://github.com/TimoBolkart.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# [VOCA: Voice Operated Character Animation](https://voca.is.tue.mpg.de)\n\nThis is an official [VOCA](https://voca.is.tue.mpg.de) repository.\n\n\u003cp align=\"center\"\u003e \n\u003cimg src=\"gif/speech_driven_animation.gif\"\u003e\n\u003c/p\u003e\n\nVOCA is a simple and generic speech-driven facial animation framework that works across a range of identities. This codebase demonstrates how to synthesize realistic character animations given an arbitrary speech signal and a static character mesh. For details please see the scientific publication\n\n```\nCapture, Learning, and Synthesis of 3D Speaking Styles.\nD. Cudeiro*, T. Bolkart*, C. Laidlaw, A. Ranjan, M. J. Black\nComputer Vision and Pattern Recognition (CVPR), 2019\n```\n\nA pre-print of the publication can be found [here](\nhttps://ps.is.tuebingen.mpg.de/uploads_file/attachment/attachment/510/paper_final.pdf).\nYou can also check out the [VOCA Blender Addon](https://github.com/SasageyoOrg/voca-blender)\n\n## Video\n\nSee the demo video for more details and results.\n\n[![VOCA](https://img.youtube.com/vi/XceCxf_GyW4/0.jpg)](https://youtu.be/XceCxf_GyW4)\n\n## Set-up\n\nThe code uses Python 3.6.8 and it was tested on Tensorflow 1.14.0.\n\nInstall pip and virtualenv\n```\nsudo apt-get install python3-pip python3-venv\n```\n\nInstall ffmpeg\n```\nsudo apt install ffmpeg\n```\n\nClone the git project:\n```\ngit clone https://github.com/TimoBolkart/voca.git\n```\n\nSet up virtual environment:\n```\nmkdir \u003cyour_home_dir\u003e/.virtualenvs\npython3.7 -m venv \u003cyour_home_dir\u003e/.virtualenvs/voca\n```\n\nActivate virtual environment:\n```\ncd voca\nsource \u003cyour_home_dir\u003e/voca/bin/activate\n```\n\nSet the right pip version:\n```\npip install -U pip==22.0.4\n```\n\nInstall mesh processing libraries from [MPI-IS/mesh](https://github.com/MPI-IS/mesh) within the virtual environment.\n\nFinally, the requirements (including tensorflow) can be installed using:\n```\npip install -r requirements.txt\n```\n\n## Data\n\n#### Data to run the demo \n\nDownload the trained VOCA model, audio sequences, and template meshes from [MPI-IS/VOCA](https://voca.is.tue.mpg.de).\u003cbr/\u003e\nDownload FLAME model from [MPI-IS/FLAME](http://flame.is.tue.mpg.de/).\u003cbr/\u003e\nDownload the trained DeepSpeech model (v0.1.0) from [Mozilla/DeepSpeech](https://github.com/mozilla/DeepSpeech/releases/tag/v0.1.0) (i.e. deepspeech-0.1.0-models.tar.gz).\n\nTo download and prepare these data, run:\n```\n./fetch_data.sh\n```\n\n#### Data used to train VOCA\n\nVOCA is trained on VOCASET, a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers that can be downloaded at [MPI-IS/VOCASET](https://voca.is.tue.mpg.de). \n\nTraining subjects:\n```\nFaceTalk_170728_03272_TA, FaceTalk_170904_00128_TA, FaceTalk_170725_00137_TA, FaceTalk_170915_00223_TA, FaceTalk_170811_03274_TA, FaceTalk_170913_03279_TA, FaceTalk_170904_03276_TA, FaceTalk_170912_03278_TA\n```\nThis is also the order of the subjects for the one-hot-encoding (i.e. FaceTalk_170728_03272_TA: 0, FaceTalk_170904_00128_TA: 1, ...)\n\nValidation subjects:\n```\nFaceTalk_170811_03275_TA, FaceTalk_170908_03277_TA\n```\n\nTest subjects:\n```\nFaceTalk_170809_00138_TA, FaceTalk_170731_00024_TA \n```\n\n## Demo\n\nWe provide demos to\n1) synthesize a character animation given an speech signal (VOCA),\n2) add eye blinks, alter identity dependent face shape and head pose of an animation sequence using FLAME, and\n3) generate templates (e.g. by sampling the [FLAME](http://flame.is.tue.mpg.de/) identity shape space, or by reconstructing a template from an image using [RingNet](https://github.com/soubhiksanyal/RingNet) that can be animated with VOCA.)\n\n##### VOCA output\n\nThis demo runs VOCA, which outputs the animation meshes given audio sequences, and renders the animation sequence to a video.\n```\npython run_voca.py --tf_model_fname './model/gstep_52280.model' --ds_fname './ds_graph/output_graph.pb' --audio_fname './audio/test_sentence.wav' --template_fname './template/FLAME_sample.ply' --condition_idx 3 --out_path './animation_output'\n```\n\nTo run VOCA and visualize the meshes with a pre-defined texture (obtained by fitting FLAME to an image using [TF_FLAME](https://github.com/TimoBolkart/TF_FLAME)), run:\n```\npython run_voca.py --tf_model_fname './model/gstep_52280.model' --ds_fname './ds_graph/output_graph.pb' --audio_fname './audio/test_sentence.wav' --template_fname './template/FLAME_sample.ply' --condition_idx 3 --uv_template_fname './template/texture_mesh.obj' --texture_img_fname './template/texture_mesh.png' --out_path './animation_output_textured'\n```\n\nBy default, running the demo uses pyrender to render the sequence to a video, however this causes problems for certain configurations (e.g. if running the code remotely). In this case try running the demo with an additional flag ```--visualize False``` to disable the visualization. The animated meshes are then still stored to the output directory and can be viewed or rendered with another tool.\n\n##### Edit VOCA output\n\nVOCA outputs meshes in FLAME topology in \"zero pose\". This allows to edit the output sequences by varying the FLAME model parameters. These demos shows how to use FLAME to add eye blinks, edit the identity dependent face shape, or head pose of a VOCA animation sequence.\n\nAdd eye blinks (FLAME2019 only):\n```\npython edit_sequences.py --source_path './animation_output/meshes' --out_path './FLAME_eye_blink' --flame_model_path  './flame/generic_model.pkl' --mode blink --num_blinks 2 --blink_duration 15\n```\nPlease not that this only demonstrates how to use FLAME to manipulate the eyelids to blink. The output motion does not resemble a true eye blink. This demo works with the FLAME2019 model only.\n\nEdit identity-dependent shape:\n```\npython edit_sequences.py --source_path './animation_output/meshes' --out_path './FLAME_variation_shape' --flame_model_path  './flame/generic_model.pkl' --mode shape --index 0 --max_variation 3\n```\n\nEdit head pose:\n```\npython edit_sequences.py --source_path './animation_output/meshes' --out_path './FLAME_variation_pose' --flame_model_path  './flame/generic_model.pkl' --mode pose --index 3 --max_variation 0.52\n```\n\n##### Render sequence\n\nThis demo renders an animation sequence to a video.\n```\npython visualize_sequence.py --sequence_path './FLAME_eye_blink/meshes' --audio_fname './audio/test_sentence.wav' --out_path './FLAME_eye_blink'\npython visualize_sequence.py --sequence_path './FLAME_variation_shape/meshes' --audio_fname './audio/test_sentence.wav' --out_path './FLAME_variation_shape'\npython visualize_sequence.py --sequence_path './FLAME_variation_pose/meshes' --audio_fname './audio/test_sentence.wav' --out_path './FLAME_variation_pose'\n```\nTo visualize the sequences with a pre-defined texture, additionally specify the flags ```--uv_template_fname``` and ```--texture_img_fname``` as done for the run_voca demo.\n\n##### Compute FLAME parameters\n\nVOCA outputs meshes in FLAME topology in \"zero pose\". This demo shows how to compute the FLAME paramters for such a sequence. \n```\npython compute_FLAME_params.py --source_path './animation_output/meshes' --params_fname './FLAME_parameters/params.npy' --flame_model_path  './flame/generic_model.pkl' --template_fname './template/FLAME_sample.ply' \n```\nThe ```--template_fname``` must specify the template provided to VOCA to generate the sequence. \n\nTo reconstruct the FLAME meshes from the sequence paramters back:\n```\npython compute_FLAME_params.py --params_fname './FLAME_parameters/params.npy' --flame_model_path  './flame/generic_model.pkl' --out_path './FLAME_parameters/meshes' \n```\n\n\n##### Sample template\n\nVOCA animates static templates in FLAME topology. Such templates can be obtained by fitting FLAME to scans, images, or by sampling the FLAME shape space. This demo randomly samples the FLAME identity shape space to generate new templates.\n```\npython sample_templates.py --flame_model_path './flame/generic_model.pkl' --num_samples 1 --out_path './template'\n```\n\n##### Reconstruct template from images\n\n[RingNet](https://ringnet.is.tue.mpg.de/) is a framework to fully automatically reconstruct 3D meshes in FLAME topology from an image. After removing effects of pose and expression, the RingNet output mesh can be used as VOCA template. Please see the RingNet [demo](https://github.com/soubhiksanyal/RingNet) on how to reconstruct a 3D mesh from an image with neutralized pose and expression.\n\n## Training\n\nWe provide code to train a VOCA model. Prior to training, run the VOCA output demo, as the training shares the requirements.\nAdditionally, download the VOCA training data from [MPI-IS/VOCA](https://voca.is.tue.mpg.de).\u003cbr/\u003e\n\nThe training code requires a config file containing all model training parameters. To create a config file, run\n```\npython config_parser.py\n```\n\nTo start training, run\n```\npython run_training.py\n```\n\nTo visualize the training progress, run\n```\ntensorboard --logdir='./training/summaries/' --port 6006\n```\nThis generates a [link](http://localhost:6006/) on the command line.  Open the link with a web browser to show the visualization.\n\n## Known issues\n\nIf you get an error like\n```\nModuleNotFoundError: No module named 'psbody'\n```\nplease check if the [MPI-IS/mesh](https://github.com/MPI-IS/mesh) is successfully installed within the virtual environment.\n\n## License\n\nFree for non-commercial and scientific research purposes. By using this code, you acknowledge that you have read the license terms (https://voca.is.tue.mpg.de/license.html), understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not use the code.\n\n\n## Referencing VOCA\n\nIf you find this code useful for your research, or you use results generated by VOCA in your research, please cite following paper:\n\n```\n@article{VOCA2019,\n    title = {Capture, Learning, and Synthesis of {3D} Speaking Styles},\n    author = {Cudeiro, Daniel and Bolkart, Timo and Laidlaw, Cassidy and Ranjan, Anurag and Black, Michael},\n    journal = {Computer Vision and Pattern Recognition (CVPR)},\n    pages = {10101--10111},\n    year = {2019}\n    url = {http://voca.is.tue.mpg.de/}\n}\n```\n\n## Acknowledgement\n\nWe thank Raffi Enficiaud and Ahmed Osman for pushing the release of psbody.mesh.\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTimoBolkart%2Fvoca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTimoBolkart%2Fvoca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTimoBolkart%2Fvoca/lists"}