{"id":13677416,"url":"https://github.com/slp-rl/HebTTS","last_synced_at":"2025-04-29T11:30:49.843Z","repository":{"id":242958298,"uuid":"810993678","full_name":"slp-rl/HebTTS","owner":"slp-rl","description":"The official implementation of \"A Language Modeling Approach to Diacritic-Free Hebrew TTS\"","archived":false,"fork":false,"pushed_at":"2024-07-21T11:25:17.000Z","size":1747,"stargazers_count":58,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-08-02T13:17:54.293Z","etag":null,"topics":["ai","hebrew","slms","tts"],"latest_commit_sha":null,"homepage":"https://pages.cs.huji.ac.il/adiyoss-lab/HebTTS/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slp-rl.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-05T18:29:28.000Z","updated_at":"2024-07-21T21:12:50.000Z","dependencies_parsed_at":"2024-06-14T12:43:57.333Z","dependency_job_id":"78ca4d85-102f-4680-9d30-daee26c3e941","html_url":"https://github.com/slp-rl/HebTTS","commit_stats":null,"previous_names":["slp-rl/hebtts"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slp-rl%2FHebTTS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slp-rl%2FHebTTS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slp-rl%2FHebTTS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slp-rl%2FHebTTS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slp-rl","download_url":"https://codeload.github.com/slp-rl/HebTTS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224163567,"owners_count":17266527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","hebrew","slms","tts"],"created_at":"2024-08-02T13:00:41.863Z","updated_at":"2025-04-29T11:30:49.829Z","avatar_url":"https://github.com/slp-rl.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# A Language Modeling Approach to Diacritic-Free Hebrew TTS (Interspeech 2024)\n\nInference code and model weights for the paper \"A Language Modeling Approach to Diacritic-Free Hebrew TTS\" (Interspeech\n2024).\n\n\u003cp align=\"center\"\u003e\n\u003ca href='https://arxiv.org/abs/2407.12206'\u003e\u003cimg src='https://img.shields.io/badge/ArXiv-PDF-red'\u003e\u003c/a\u003e\n   \u003ca href='https://pages.cs.huji.ac.il/adiyoss-lab/HebTTS/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e \n   \u003ca href='https://colab.research.google.com/drive/1f3-6Dqbna9_hI5C9V4qTIG05dixW-r72?usp=sharing'\u003e\u003cimg src='https://colab.research.google.com/assets/colab-badge.svg'\u003e\u003c/a\u003e \n   \u003ca href='https://github.com/slp-rl/HebTTS'\u003e\u003cimg src='https://badges.aleen42.com/src/github.svg'\u003e\u003c/a\u003e \n\n\u003c/p\u003e\n\n![](imgs/model.jpg)\n\n___\n**Abstract:** We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics (`Niqqud'),\n which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of\n diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which\n phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between\n text-to-speech. In this study, we propose to adopt a language modeling Diacritics-Free TTS approach, for the task of\n Hebrew TTS. The language model (LM) operates on discrete speech representations and is conditioned on a word-piece\n tokenizer. We optimize the proposed method using in-the-wild weakly supervised recordings and compare it to several\n diacritic based Hebrew TTS systems. Results suggest the proposed method is superior to the evaluated baselines\n considering both content preservation and naturalness of the generated speech.\n\n## Try it out!\nYou can try our model in the [google colab](https://colab.research.google.com/drive/1f3-6Dqbna9_hI5C9V4qTIG05dixW-r72?usp=sharing) demo.\n## Installation\n\n\n```bash\ngit clone https://github.com/slp-rl/HebTTS.git\n```\n\nWe publish our checkpoint\nin [google drive](https://drive.google.com/file/d/11NoOJzMLRX9q1C_Q4sX0w2b9miiDjGrv/view?usp=share_link).\nAR model trained for 1.2M steps and NAR model for 200K steps on [HebDB](https://pages.cs.huji.ac.il/adiyoss-lab/HebDB/).\n\n```bash\ngdown 11NoOJzMLRX9q1C_Q4sX0w2b9miiDjGrv\n```\n### Install Dependencies\n\n```bash\npip install torch torchaudio\npip install torchmetrics\npip install omegaconf\npip install git+https://github.com/lhotse-speech/lhotse\npip install librosa\npip install encodec\npip install phonemizer\npip install audiocraft  # optional\npip install 'numpy\u003c2'\n```\n\n## Inference\n\nYou can play with the model with different speakers and text prompts.\n\nrun `infer.py`:\n\n```\npython infer.py  --checkpoint checkpoint.pt --output-dir ./out --text \"היי מה קורה\"\n```\n\nyou can specify additional arguments\n`--speaker` and `--top-k`.\n\n### Multi Band Diffusion\n\n\u003e [!TIP] \n\u003e We allow using the new Multi Band Diffusion (MBD) vocoder for generating a better quallity audio.\nInstall audiocraft and set `--mbd True` flag.\n\n\n\n\n### Text\n\nyou can concatenate text prompts using `|` or specify a path of a text file spereated by `\\n` if writing Hebrew in\nterminal is inconvenient.\n\n```text\nתגידו גנבו לכם פעם את האוטו ופשוט ידעתם שאין טעם להגיש תלונה במשטרה\nהיי מה קורה\nבראשית היתה חללית מסוג נחתת\n```\n\nand run\n\n```\npython infer.py  --checkpoint checkpoint.pt --output-dir ./out --text example.txt\n```\n\n### Speakers\n\nyou can use the speaker defined in `speakers.yaml`, or add additional speakers.\nspecify wav files and transcription in same format.\n\n```\n--speaker shaul\n```\n\n## Citation\n\n```bibtex\n@article{roth2024language,\n  title={A Language Modeling Approach to Diacritic-Free Hebrew TTS},\n  author={Roth, Amit and Turetzky, Arnon and Adi, Yossi},\n  journal={arXiv preprint arXiv:2407.12206},\n  year={2024}\n}\n```\n\n## Acknowledgments\n- Model code inside `valle` is based on the implementation of [Feiteng Li](https://github.com/lifeiteng/vall-e).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslp-rl%2FHebTTS","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslp-rl%2FHebTTS","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslp-rl%2FHebTTS/lists"}