https://github.com/zsxkib/cog-hibiki
Cog wrapper for Hibiki: High-Fidelity Simultaneous Speech-To-Speech Translation
https://github.com/zsxkib/cog-hibiki
Last synced: 2 months ago
JSON representation
Cog wrapper for Hibiki: High-Fidelity Simultaneous Speech-To-Speech Translation
- Host: GitHub
- URL: https://github.com/zsxkib/cog-hibiki
- Owner: zsxkib
- License: mit
- Created: 2025-02-10T16:59:31.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-11T09:27:03.000Z (8 months ago)
- Last Synced: 2025-02-11T10:31:27.179Z (8 months ago)
- Language: Python
- Size: 6.84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hibiki: Real-Time Speech Translation
[[Paper]][hibiki] | [[Samples]](https://huggingface.co/spaces/kyutai/hibiki-samples) | [[HuggingFace Models]](https://huggingface.co/collections/kyutai/hibiki-fr-en-67a48835a3d50ee55d37c2b5)
Hibiki is a state-of-the-art model for **real-time speech-to-speech translation** that maintains voice characteristics while translating. It works with French-to-English translation and can run locally on consumer hardware.
## Quick Start
Run translation with a single command using Cog:
```bash
sudo cog predict -i audio_input=@sample_fr_hibiki_crepes.mp3
```This will translate the sample French audio file to English while preserving voice characteristics. Replace with your own `.mp3` file for custom translations.
## Key Features
- 🎙️ **Voice preservation** through classifier-free guidance
- ⏱️ **Real-time processing** with 12.5Hz framerate
- 🔊 **Natural-sounding output** in target language
- 📜 Simultaneous text transcription[hibiki]: https://arxiv.org/abs/2502.03382