{"id":25328563,"url":"https://github.com/lucadellalib/focalcodec","last_synced_at":"2025-07-04T12:05:57.910Z","repository":{"id":276712808,"uuid":"926440925","full_name":"lucadellalib/focalcodec","owner":"lucadellalib","description":"A low-bitrate single-codebook 16 kHz speech codec based on focal modulation","archived":false,"fork":false,"pushed_at":"2025-02-10T01:44:40.000Z","size":7503,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-10T02:29:15.410Z","etag":null,"topics":["codec","deep-learning","focal-modulation","neural-speech-coding","pytorch","speech-synthesis","vector-quantization","vocos","wavlm"],"latest_commit_sha":null,"homepage":"https://lucadellalib.github.io/focalcodec-web/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucadellalib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-03T09:03:40.000Z","updated_at":"2025-02-10T01:52:19.000Z","dependencies_parsed_at":"2025-02-10T02:39:17.757Z","dependency_job_id":null,"html_url":"https://github.com/lucadellalib/focalcodec","commit_stats":null,"previous_names":["lucadellalib/focalcodec"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Ffocalcodec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Ffocalcodec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Ffocalcodec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucadellalib%2Ffocalcodec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucadellalib","download_url":"https://codeload.github.com/lucadellalib/focalcodec/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247773736,"owners_count":20993634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codec","deep-learning","focal-modulation","neural-speech-coding","pytorch","speech-synthesis","vector-quantization","vocos","wavlm"],"created_at":"2025-02-14T02:56:14.597Z","updated_at":"2025-04-08T04:03:58.081Z","avatar_url":"https://github.com/lucadellalib.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ⚡ FocalCodec\n\n![License](https://img.shields.io/github/license/lucadellalib/focalcodec)\n![Stars](https://img.shields.io/github/stars/lucadellalib/focalcodec?style=social)\n\nA low-bitrate single-codebook 16 kHz speech codec based on [focal modulation](https://arxiv.org/abs/2203.11926).\n\n- 📜 **Preprint**: https://arxiv.org/abs/2502.04465\n\n- 🌐 **Project Page**: https://lucadellalib.github.io/focalcodec-web/\n\n- 🔊 **Downstream Tasks**: https://github.com/lucadellalib/audiocodecs\n\n\u003cimg src=\"docs/_static/images/focalcodec.png\" width=\"700\"\u003e\n\n---------------------------------------------------------------------------------------------------------\n\n## 📌 Available Checkpoints\n\n|                                       Checkpoint                                        | Token Rate (Hz) | Bitrate (kbps) |   Dataset   |\n|:---------------------------------------------------------------------------------------:|:---------------:|:--------------:|:-----------:|\n|   [lucadellalib/focalcodec_50hz](https://huggingface.co/lucadellalib/focalcodec_50hz)   |       50.0      |      0.65      | LibriTTS960 |\n|   [lucadellalib/focalcodec_25hz](https://huggingface.co/lucadellalib/focalcodec_25hz)   |      25.0       |      0.33      | LibriTTS960 |\n| [lucadellalib/focalcodec_12_5hz](https://huggingface.co/lucadellalib/focalcodec_12_5hz) |      12.5       |      0.16      | LibriTTS960 |\n\n---------------------------------------------------------------------------------------------------------\n\n## 🛠️️ Installation\n\nFirst of all, install [Python 3.8 or later](https://www.python.org). Then, open a terminal and run:\n\n```\npip install huggingface-hub safetensors soundfile torch torchaudio\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## ▶️ Quickstart\n\n**NOTE**: the `audio-samples` directory contains audio samples that you can download and use to test the codec.\n\nYou can easily load the model using `torch.hub` without cloning the repository:\n\n```python\nimport torch\nimport torchaudio\n\n# Load FocalCodec model\nconfig = \"lucadellalib/focalcodec_50hz\"\ncodec = torch.hub.load(\n    \"lucadellalib/focalcodec\", \"focalcodec\", config=config, force_reload=True\n)\ncodec.eval().requires_grad_(False)\n\n# Load and preprocess the input audio\naudio_file = \"audio-samples/librispeech-dev-clean/251-118436-0003.wav\"\nsig, sample_rate = torchaudio.load(audio_file)\nsig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate)\n\n# Encode audio into tokens\ntoks = codec.sig_to_toks(sig)  # Shape: (batch, time)\nprint(toks.shape)\nprint(toks)\n\n# Convert tokens to their corresponding binary spherical codes\ncodes = codec.toks_to_codes(toks)  # Shape: (batch, time, log2 codebook_size)\nprint(codes.shape)\nprint(codes)\n\n# Decode tokens back into a waveform\nrec_sig = codec.toks_to_sig(toks)\n\n# Save the reconstructed audio\nrec_sig = torchaudio.functional.resample(rec_sig, codec.sample_rate, sample_rate)\ntorchaudio.save(\"reconstruction.wav\", rec_sig, sample_rate)\n```\n\nAlternatively, you can install FocalCodec as a standard Python package using `pip`:\n\n```bash\npip install focalcodec@git+https://github.com/lucadellalib/focalcodec.git@main#egg=focalcodec\n```\n\nOnce installed, you can import it in your scripts:\n\n```python\nimport focalcodec\n\nconfig = \"lucadellalib/focalcodec_50hz\"\ncodec = focalcodec.FocalCodec.from_pretrained(config)\n```\n\nCheck the code documentation for more details on model usage and available configurations.\n\n---------------------------------------------------------------------------------------------------------\n\n## 🎤 Running the Demo Script\n\nClone or download and extract the repository, navigate to `\u003cpath-to-repository\u003e`, open a terminal and run:\n\n**Speech Resynthesis**\n\n```bash\npython demo.py \\\n--input_file audio-samples/librispeech-dev-clean/251-118436-0003.wav \\\n--output_file reconstruction.wav\n```\n\n**Voice Conversion**\n\n```bash\npython demo.py \\\n--input_file audio-samples/librispeech-dev-clean/251-118436-0003.wav \\\n--output_file reconstruction.wav \\\n--reference_files audio-samples/librispeech-dev-clean/84\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## @ Citing\n\n```\n@article{dellalibera2025focalcodec,\n    title   = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks},\n    author  = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli},\n    journal = {arXiv preprint arXiv:2502.04465},\n    year    = {2025},\n}\n```\n\n---------------------------------------------------------------------------------------------------------\n\n## 📧 Contact\n\n[luca.dellalib@gmail.com](mailto:luca.dellalib@gmail.com)\n\n---------------------------------------------------------------------------------------------------------\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucadellalib%2Ffocalcodec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucadellalib%2Ffocalcodec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucadellalib%2Ffocalcodec/lists"}