{"id":34221452,"url":"https://github.com/dlp3d-ai/audio2face","last_synced_at":"2026-03-10T08:33:08.398Z","repository":{"id":321720775,"uuid":"1015167098","full_name":"dlp3d-ai/audio2face","owner":"dlp3d-ai","description":"Audio2Face is a real-time audio-to-face animation service that converts streaming audio input into synchronized facial animation data. The system uses advanced machine learning models to extract audio features and generate corresponding facial expressions.","archived":false,"fork":false,"pushed_at":"2026-01-12T09:57:33.000Z","size":159,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-12T18:49:32.786Z","etag":null,"topics":["3d-animation","blendshapes","digital-humanities","fastapi","python","realtime","realtime-chat"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dlp3d-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-07T05:29:02.000Z","updated_at":"2025-12-19T12:49:08.000Z","dependencies_parsed_at":"2025-10-31T06:35:46.136Z","dependency_job_id":null,"html_url":"https://github.com/dlp3d-ai/audio2face","commit_stats":null,"previous_names":["dlp3d-ai/audio2face"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/dlp3d-ai/audio2face","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlp3d-ai%2Faudio2face","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlp3d-ai%2Faudio2face/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlp3d-ai%2Faudio2face/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlp3d-ai%2Faudio2face/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dlp3d-ai","download_url":"https://codeload.github.com/dlp3d-ai/audio2face/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlp3d-ai%2Faudio2face/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30328251,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-animation","blendshapes","digital-humanities","fastapi","python","realtime","realtime-chat"],"created_at":"2025-12-15T23:15:09.727Z","updated_at":"2026-03-10T08:33:07.081Z","avatar_url":"https://github.com/dlp3d-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audio2Face\n\n\u003e **English Documentation** | [中文文档](README_CN.md)\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Data Preparation](#data-preparation)\n- [Quick Start](#quick-start)\n- [Documentation](#documentation)\n- [Citation](#citation)\n- [License](#license)\n\n## Overview\n\nAudio2Face is a real-time audio-to-face animation service that converts streaming audio input into synchronized facial animation data. The system uses advanced machine learning models to extract audio features and generate corresponding facial expressions, supporting both CPU and GPU-accelerated inference for optimal performance.\n\nThe service is designed for real-time applications such as virtual avatars, live streaming, video conferencing, and interactive entertainment platforms where low-latency audio-to-face conversion is essential.\n\n### Key Features\n\n- **Real-time Streaming Processing**: Process audio chunks in real-time with low latency for live applications\n- **Dual Inference Engines**: Support for both ONNX-based Unitalker and PyTorch-based feature extraction\n- **GPU Acceleration**: CUDA 12.1 support for high-performance inference on NVIDIA GPUs\n- **Comprehensive Postprocessing**: 9 postprocessing modules including blendshape clipping, scaling, thresholding, and custom blink animations\n- **Flexible Audio Splitting**: Energy-based silence detection for intelligent audio segmentation\n- **WebSocket API**: FastAPI-based streaming interface with Protocol Buffers serialization\n- **Docker Support**: Pre-built Docker images for both CPU and CUDA environments\n- **Configurable Pipelines**: Modular architecture allowing custom postprocessing pipeline configurations\n- **Multi-threaded Processing**: Asynchronous processing with configurable worker pools\n- **Comprehensive Testing**: Full test coverage with pytest and async testing support\n\n### System Architecture\n\nThe system consists of several key components:\n\n- **Streaming API Layer**: FastAPI-based WebSocket server handling real-time audio streaming requests\n- **Feature Extraction**: Audio feature extraction using Wav2Vec2 models via PyTorch or ONNX runtime\n- **Inference Engine**: ONNX-based Unitalker model for audio-to-blendshape conversion\n- **Audio Splitting**: Energy-based silence detection for intelligent audio segmentation\n- **Postprocessing Pipeline**: Modular postprocessing system with 9 specialized modules:\n  - Blendshape clipping, scaling, and thresholding\n  - Custom blink animation injection\n  - Linear and exponential blending\n  - Offset adjustment and name mapping\n- **Data Structures**: FaceClip class for managing facial animation data with format conversion\n- **Configuration System**: Flexible configuration management for different deployment scenarios\n- **Logging \u0026 Monitoring**: Comprehensive logging with AWS CloudWatch integration support\n\n## Data Preparation\n\nTo use Audio2Face, you need to download the ONNX model file and set up the required directory structure.\n\n### Download ONNX Model\n\n1. **Download the ONNX model file:**\n   - **GitHub Download:** [unitalker_v0.4.0_base.onnx](https://github.com/LazyBusyYang/CatStream/releases/download/a2f_cicd_files/unitalker_v0.4.0_base.onnx)\n   - **Google Drive Download:** [unitalker_v0.4.0_base.onnx](https://drive.google.com/file/d/1E0NTrsh4mciRPb265n64Dd5vR3Sa7Dgx/view?usp=drive_link)\n\n2. **Organize the data:**\n   - Create a `weights` directory in your project root if it doesn't exist\n   - Place the downloaded `unitalker_v0.4.0_base.onnx` file in the `weights` directory\n   - Ensure the following directory structure is created:\n\n```\n├─audio2face\n├─configs\n├─docs\n└─weights\n   └─unitalker_v0.4.0_base.onnx\n```\n\n### Directory Structure Explanation\n\n- `weights/`: A folder for storing ONNX model files.\n- `weights/unitalker_v0.4.0_base.onnx`: The main ONNX model file for audio-to-face conversion.\n\n## Quick Start\n\n### Using Docker\n\nThe easiest way to get started with Audio2Face is using the pre-built Docker image:\n\n**Linux/macOS:**\n```bash\n# Pull and run the pre-built image (CPU version)\ndocker run -it \\\n  -p 18083:18083 \\\n  -v $(pwd)/weights:/workspace/audio2face/weights \\\n  dlp3d/audio2face:latest\n\n# Or run with CUDA support (requires NVIDIA GPU with Docker support)\ndocker run -it \\\n  --gpus all \\\n  -p 18083:18083 \\\n  -v $(pwd)/weights:/workspace/audio2face/weights \\\n  dlp3d/audio2face:latest-cuda12\n```\n\n**Windows:**\n```bash\n# Pull and run the pre-built image\ndocker run -it -p 18083:18083 -v .\\weights:/workspace/audio2face/weights dlp3d/audio2face:latest\n```\n\n**Command Explanation:**\n- `-p 18083:18083`: Maps the container's port 18083 to your host machine's port 18083\n- `-v $(pwd)/weights:/workspace/audio2face/weights` (Linux/macOS): Mounts your local `weights` directory to the container's weights directory\n- `-v .\\weights:/workspace/audio2face/weights` (Windows): Mounts your local `weights` directory to the container's weights directory\n- `dlp3d/audio2face:latest`: Uses the pre-built public image\n\n**Prerequisites:**\n- Ensure you have a `weights` directory in your project root\n- Ensure you have a `weights/unitalker_v0.4.0_base.onnx` file in your `weights` directory\n- Make sure Docker is installed and running on your system\n\n## Documentation\n\nFor detailed documentation, please visit our comprehensive documentation site:\n\n📖 **[Full Documentation](https://dlp3d.readthedocs.io/en/latest/_subrepos/audio2face/overview.html)**\n\nThe documentation provides detailed information on:\n\n- **Data Preparation**: Step-by-step guide for downloading and organizing model files\n- **Quick Start**: Comprehensive Docker setup and local development instructions\n- **Installation Guide**: Detailed environment setup for Linux and Windows\n- **API Documentation**: Complete streaming API reference with request/response formats\n- **Configuration**: Configuration options for different deployment scenarios\n- **Development**: Project structure, testing guidelines, and code quality standards\n\n## Citation\n\nThis project uses the UniTalker algorithm for facial animation generation:\n\n```bibtex\n@article{unitalker2024,\n  title={UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model},\n  journal={ECCV},\n  year={2024}\n}\n```\n\n**Reference**: [UniTalker GitHub Repository](https://github.com/X-niper/UniTalker) - ECCV 2024\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\nThe MIT License is a permissive free software license that allows you to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software with minimal restrictions. The only requirement is that the original copyright notice and license text must be included in all copies or substantial portions of the software.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlp3d-ai%2Faudio2face","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdlp3d-ai%2Faudio2face","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlp3d-ai%2Faudio2face/lists"}