{"id":13456616,"url":"https://github.com/collabora/WhisperFusion","last_synced_at":"2025-03-24T11:30:47.625Z","repository":{"id":213149848,"uuid":"732120773","full_name":"collabora/WhisperFusion","owner":"collabora","description":"WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.","archived":false,"fork":false,"pushed_at":"2024-07-31T13:28:41.000Z","size":522,"stargazers_count":1586,"open_issues_count":20,"forks_count":118,"subscribers_count":19,"default_branch":"main","last_synced_at":"2025-03-17T19:21:23.129Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/collabora.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-15T17:37:49.000Z","updated_at":"2025-03-17T16:18:52.000Z","dependencies_parsed_at":"2023-12-18T22:36:59.287Z","dependency_job_id":"3c4fab8a-05b4-4968-8af5-7dea6a7cd1af","html_url":"https://github.com/collabora/WhisperFusion","commit_stats":null,"previous_names":["collabora/whisperbot","collabora/whisperfusion"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FWhisperFusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FWhisperFusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FWhisperFusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collabora%2FWhisperFusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/collabora","download_url":"https://codeload.github.com/collabora/WhisperFusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245260710,"owners_count":20586444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:01:24.944Z","updated_at":"2025-03-24T11:30:47.317Z","avatar_url":"https://github.com/collabora.png","language":"Python","funding_links":[],"categories":["Python","语音识别与合成_其他","Repos"],"sub_categories":["网络服务_其他"],"readme":"# WhisperFusion\n\n\u003ch2 align=\"center\"\u003e\n  \u003ca href=\"https://www.youtube.com/watch?v=_PnaP0AQJnk\"\u003e\u003cimg\nsrc=\"https://img.youtube.com/vi/_PnaP0AQJnk/0.jpg\" style=\"background-color:rgba(0,0,0,0);\" height=300 alt=\"WhisperFusion\"\u003e\u003c/a\u003e\n  \u003cbr\u003e\u003cbr\u003eSeamless conversations with AI (with ultra-low latency)\u003cbr\u003e\u003cbr\u003e\n\u003c/h2\u003e\n\nWelcome to WhisperFusion. WhisperFusion builds upon the capabilities of\nthe [WhisperLive](https://github.com/collabora/WhisperLive) and\n[WhisperSpeech](https://github.com/collabora/WhisperSpeech) by\nintegrating Mistral, a Large Language Model (LLM), on top of the\nreal-time speech-to-text pipeline. Both LLM and\nWhisper are optimized to run efficiently as TensorRT engines, maximizing\nperformance and real-time processing capabilities. While WhiperSpeech is \noptimized with torch.compile.\n\n## Features\n\n- **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert\n  spoken language into text in real-time.\n\n- **Large Language Model Integration**: Adds Mistral, a Large Language\n  Model, to enhance the understanding and context of the transcribed\n  text.\n\n- **TensorRT Optimization**: Both LLM and Whisper are optimized to\n  run as TensorRT engines, ensuring high-performance and low-latency\n  processing.\n- **torch.compile**: WhisperSpeech uses torch.compile to speed up \n  inference which makes PyTorch code run faster by JIT-compiling PyTorch\n  code into optimized kernels.\n\n## Hardware Requirements\n\n- A GPU with at least 24GB of RAM\n- For optimal latency, the GPU should have a similar FP16 (half) TFLOPS as the RTX 4090. Here are the [hardware specifications](https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889) for the RTX 4090.\n\nThe demo was run on a single RTX 4090 GPU. WhisperFusion uses the Nvidia TensorRT-LLM library for CUDA optimized versions of popular LLM models. TensorRT-LLM supports multiple GPUs, so it should be possible to run WhisperFusion for even better performance on multiple GPUs.\n\n## Getting Started\nWe provide a Docker Compose setup to streamline the deployment of the pre-built TensorRT-LLM docker container. This setup includes both Whisper and Phi converted to TensorRT engines, and the WhisperSpeech model is pre-downloaded to quickly start interacting with WhisperFusion. Additionally, we include a simple web server for the Web GUI.\n\n- Build and Run with docker compose\n```bash\nmkdir docker/scratch-space\ncp docker/scripts/build-* docker/scripts/run-whisperfusion.sh docker/scratch-space/\n\ndocker compose build\nexport MODEL=Phi-3-mini-4k-instruct    #Phi-3-mini-128k-instruct or phi-2, By default WhisperFusion uses phi-2\ndocker compose up\n```\n\n- Start Web GUI on `http://localhost:8000`\n\n**NOTE**\n\n## Contact Us\n\nFor questions or issues, please open an issue. Contact us at:\nmarcus.edel@collabora.com, jpc@collabora.com,\nvineet.suryan@collabora.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcollabora%2FWhisperFusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcollabora%2FWhisperFusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcollabora%2FWhisperFusion/lists"}