{"id":31947982,"url":"https://github.com/hcoles/voices","last_synced_at":"2025-10-14T11:50:21.809Z","repository":{"id":316647837,"uuid":"1064214693","full_name":"hcoles/voices","owner":"hcoles","description":"Fast, in-process text to speech for Java","archived":false,"fork":false,"pushed_at":"2025-10-05T09:10:35.000Z","size":7648,"stargazers_count":15,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-05T09:17:57.719Z","etag":null,"topics":["java","onnx","piper","piper-tts","tts"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hcoles.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-25T17:51:19.000Z","updated_at":"2025-10-05T08:55:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"478bc4a5-3863-4ad2-8a71-f567615096fa","html_url":"https://github.com/hcoles/voices","commit_stats":null,"previous_names":["hcoles/voices"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/hcoles/voices","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcoles%2Fvoices","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcoles%2Fvoices/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcoles%2Fvoices/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcoles%2Fvoices/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hcoles","download_url":"https://codeload.github.com/hcoles/voices/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcoles%2Fvoices/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279019069,"owners_count":26086518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","onnx","piper","piper-tts","tts"],"created_at":"2025-10-14T11:50:20.268Z","updated_at":"2025-10-14T11:50:21.802Z","avatar_url":"https://github.com/hcoles.png","language":"Java","funding_links":[],"categories":["人工智能"],"sub_categories":[],"readme":"# Voices\n\nFast in-process text to speech for Java 17 and above. No external apis. No system dependencies.\n\n\n* [sample 1](https://github.com/user-attachments/assets/3bb91fe5-682a-498b-ab38-3f4e0d1885f6)\n* [sample 2](https://github.com/user-attachments/assets/3ff5dd48-df3f-4b47-9b4e-e88f97bf6d4d)\n\n# What is this?\n\nAn easy-to-use local text to speech library for Java.\n\nIt can produce reasonable quality audio using low-specced hardware.\n\nIt provides several components\n\n* Code to run the voice models from the [piper](https://github.com/rhasspy/piper) project\n* A piper-compatible pure Java phonemizer for English partially ported from [phonemize](https://github.com/hans00/phonemize)\n* Compatible phoneme dictionaries for uk and us English\n* A multi-lingual phonemizer using the [onnx model](https://huggingface.co/OpenVoiceOS/g2p-mbyt5-12l-ipa-childes-espeak-onnx) from OpenVoiceOs \n* A small number of piper models available as dependencies on maven central\n* Code to download other models not uploaded to central\n\nThe models are run using the onnxruntime library, so can utilise both CPU and GPU.\n\n## Releases\n\nSee [Releases](https://github.com/hcoles/voices/releases)\n\n## English-Only Usage With Rules Based Phonemizer\n\nUsing Voices requires three code dependencies and one or more models.\n\n```xml\n\u003c!-- main dependency --\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.pitest.voices\u003c/groupId\u003e\n    \u003cartifactId\u003echorus\u003c/artifactId\u003e\n    \u003cversion\u003e0.0.7\u003c/version\u003e\n\u003c/dependency\u003e\n\u003c!-- a prepackaged model --\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.pitest.voices\u003c/groupId\u003e\n    \u003cartifactId\u003ealba\u003c/artifactId\u003e\n    \u003cversion\u003e0.0.7\u003c/version\u003e\n\u003c/dependency\u003e\n\u003c!-- dictionary of pronunciations --\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.pitest.voices\u003c/groupId\u003e\n    \u003cartifactId\u003een_uk\u003c/artifactId\u003e \u003c!-- or en_us --\u003e\n    \u003cversion\u003e0.0.7\u003c/version\u003e\n\u003c/dependency\u003e\n\u003c!-- runtime for onnx models --\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.microsoft.onnxruntime\u003c/groupId\u003e\n    \u003cartifactId\u003eonnxruntime\u003c/artifactId\u003e \u003c!-- or onnxruntime_gpu --\u003e\n    \u003cversion\u003e1.22.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nTechnically, the rules based phonemizer can be used without a dictionary, but the quality of the speech would be poor.\n\nThe `Chorus` class acts as a manager for voice models, handling loading and freeing of resources. Loading is an expensive\noperation, so it is recommended to keep a single instance of `Chorus` for the lifetime of your application.\n\n```java\nChorusConfig config = chorusConfig(EnUkDictionary.en_uk());\ntry (Chorus chorus = new Chorus(config)) {\n    Voice alba = chorus.voice(Alba.albaMedium());\n\n    Audio audio = alba.say(\"Hello there, I'm vaguely Scottish.\");\n    audio.save(some path);\n}\n```\n\nThe example above uses a model retrieved at build time as a normal maven dependency.\n\nA wider range of models can be retrieved at runtime by adding the model downloader dependency. \n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.pitest.voices\u003c/groupId\u003e\n    \u003cartifactId\u003emodel-downloader\u003c/artifactId\u003e\n    \u003cversion\u003e0.0.7\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nModels can be retrieved using the factory methods on the\n\n* org.pitest.voices.download.Models\n* org.pitest.voices.download.UsModels\n* org.pitest.voices.download.NonEnglishModels\n\nClasses.\n\nBy default, voice models are downloaded to `~/.cache/voices/`, but this can be configured in ChorusConfig.\n\n## Multi-lingual Usage\n\nThe OpenVoice phonemizer is much more capable than the rules-based one. It can be used without a dictionary to\ncreate good quality speech in multiple languages (including English).\n\nIt is more heavy-weight, using a 50mb model (compared to 3mb for a dictionary file), and is more computationally \nexpensive.\n\nOnce the dependency has been added\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.pitest.voices\u003c/groupId\u003e\n    \u003cartifactId\u003eopenvoice-phonemizer\u003c/artifactId\u003e\n    \u003cversion\u003e0.0.7\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nThe phonemizer can be selected with\n\n```java\nChorusConfig config = chorusConfig(Dictionary.empty())\n        .withModel(new OpenVoiceSupplier());\n```\n\n## Running on GPU\n\nModels can be run on GPU instead of CPU by using the `onnxruntime_gpu` dependency instead of `onnxruntime`. It is\nimportant that only the `onnxruntime_gpu` dependency is on the classpath. If the standard `onnxruntime` is also present the model\nwill fail to load to gpu.\n\nTo activate the gpu, the gpuChorusConfig can be used.\n\n```java\nChorusConfig config = gpuChorusConfig(EnUkDictionary.en_uk());\n```\n\nThis runs the model on gpu 0 with no other options set. More complex setups can be configured using the `withCudaOptions`\nmethod on ChorusConfig.\n\n## Pauses\n\nVoices will add pauses if it encounters the following markdown symbols\n\n* Markdown `#` Style Headings\n* Markdown `---` Section breaks\n* Em dashes and Markdown --- em dashes\n* En dashes and Markdown -- en dashes\n\nThe defaults can be adjusted via the ChorusConfig class.\n\n## Heterographs\n\nAlthough its hetrograph (words with the same spelling, but different meanings and (sometimes) pronunciations) \ndictionary is currently small, Voices has quite good hetrograph handling thanks to its use of the \npart of speech tagging provided by the OpenNLP library. It sometimes performs better than piper and espeak-ng.\n\nPhrases such as\n\n* *I moped on my moped.*\n* *I rebel because I am a rebel.*\n* *Sow the seeds for the sow to eat.*\n\nAll use the correct pronunciations of the heterographs.\n\n## Licencing\n\nMost of the project is licenced under Apache 2. The en_uk dictionary is released under GPL 3 due to a cautious\ninterpretation of the licencing terms of the espeak-ng tool which was used to generate much of its content.\n\nAlthough generally the GPL does not apply to the output of a program, it seems probable that feeding a word list\nto espeak-ng will result in it regurgitating a significant proportion of its own internal dictionary.\n\nThe en_us dictionary is of lower quality, but is generated by transforming the CMU dictionary which, whilst copyrighted by\nCarnegie Mellon University, is free to use so long as its copyright is acknowledged.\n\nThe models from the piper project are not part of this project and may have their own usage restrictions. Please \ncheck they match your use case.\n\n## Alternatives\n\n### Sherpa Onnx\n\nThe [Sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) project can also run piper models.\n\nAt the point this project was initiated, sherpa was difficult to consume as it was not available from maven central and required \nmanual installation of native libraries. It also seemed to handle homographs poorly.\n\nThis situation may have since improved.\n\n### Mary TTS\n\n[Mary TTS](https://github.com/marytts/marytts) is very mature and produces reasonable quality speech, however it sounds a little\nrobotic by modern standards.\n\n## Development\n\nAlthough much of the ported logic is not well tested, there are a splattering of tests to prevent major regression\nwhile changing things, and a few tests that by default play audio to allow experimentation. \n\nIf you're building from the command line, the audio can be disabled with.\n\n```bash\nmvn -Dsilent=true install\n```\n\n## Background\n\nI created this library to narrate my own writing as part of my editing loop. Initially\nit called the sherpa native libraries, but I kept coming back to the idea of writing a pure\nJava phonemizer as it would allow some degree of control over pauses, which are important\nfor narrating fiction.\n\nI have no background in text to speech or linguistics, so much of the functionality relies on work\nby other better qualified people.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhcoles%2Fvoices","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhcoles%2Fvoices","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhcoles%2Fvoices/lists"}