{"id":13961504,"url":"https://github.com/tapeinosyne/hyphenation","last_synced_at":"2026-03-17T22:13:56.809Z","repository":{"id":47679759,"uuid":"51881486","full_name":"tapeinosyne/hyphenation","owner":"tapeinosyne","description":"Text hyphenation for Rust","archived":false,"fork":false,"pushed_at":"2024-01-24T12:37:32.000Z","size":6151,"stargazers_count":54,"open_issues_count":6,"forks_count":11,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-07-14T23:07:13.710Z","etag":null,"topics":["hyphenation","unicode"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tapeinosyne.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-17T00:29:06.000Z","updated_at":"2025-04-19T02:31:23.000Z","dependencies_parsed_at":"2024-01-15T03:59:36.236Z","dependency_job_id":"909c1048-4d25-47fc-a99e-d0e35d7aea41","html_url":"https://github.com/tapeinosyne/hyphenation","commit_stats":{"total_commits":178,"total_committers":11,"mean_commits":"16.181818181818183","dds":0.3764044943820225,"last_synced_commit":"19693c61e21fbd850fb32a0d11013f2e9e805f23"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/tapeinosyne/hyphenation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tapeinosyne%2Fhyphenation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tapeinosyne%2Fhyphenation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tapeinosyne%2Fhyphenation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tapeinosyne%2Fhyphenation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tapeinosyne","download_url":"https://codeload.github.com/tapeinosyne/hyphenation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tapeinosyne%2Fhyphenation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266253505,"owners_count":23900051,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hyphenation","unicode"],"created_at":"2024-08-08T17:01:12.124Z","updated_at":"2025-12-12T14:50:37.436Z","avatar_url":"https://github.com/tapeinosyne.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# `hyphenation`\n\nHyphenation for UTF-8 strings in a variety of languages.\n\n```toml\n[dependencies]\nhyphenation = \"0.8.3\"\n```\n\nTwo strategies are available:\n- Standard Knuth–Liang hyphenation, with dictionaries built from the [TeX UTF-8 patterns](http://www.ctan.org/tex-archive/language/hyph-utf8).\n- Extended (“non-standard”) hyphenation based on László Németh's [Automatic non-standard hyphenation in OpenOffice.org](https://www.tug.org/TUGboat/tb27-1/tb86nemeth.pdf), with dictionaries built from Libre/OpenOffice patterns.\n\n\n## Documentation\n\n[Docs.rs](https://docs.rs/hyphenation)\n\n\n## Usage\n\n### Quickstart\n\nThe `hyphenation` library relies on hyphenation dictionaries, external files that must be loaded into memory. To start with, however, it can be more convenient to embed them in the compiled artifact.\n\n```toml\n[dependencies]\nhyphenation = { version = \"0.8.3\", features = [\"embed_all\"] }\n```\n\nThe topmost module of `hyphenation` offers a small prelude that can be imported to expose the most common functionality.\n\n```rust\nuse hyphenation::*;\n\n// Retrieve the embedded American English dictionary for `Standard` Knuth-Liang hyphenation.\nlet en_us = Standard::from_embedded(Language::EnglishUS) ?;\n\n// Identify valid breaks in the given word.\nlet hyphenated = en_us.hyphenate(\"hyphenation\");\n\n// Word breaks are represented as byte indices into the string.\nlet break_indices = \u0026hyphenated.breaks;\nassert_eq!(break_indices, \u0026[2, 6, 7]);\n\n// The segments of a hyphenated word can be iterated over, marked or unmarked.\nlet marked = hyphenated.iter();\nlet collected : Vec\u003cString\u003e = marked.collect();\nassert_eq!(collected, vec![\"hy-\", \"phen-\", \"a-\", \"tion\"]);\n\nlet unmarked = hyphenated.iter().segments();\nlet collected : Vec\u003c\u0026str\u003e = unmarked.collect();\nassert_eq!(collected, vec![\"hy\", \"phen\", \"a\", \"tion\"]);\n\n// `hyphenate()` is case-insensitive.\nlet uppercase : Vec\u003c_\u003e = en_us.hyphenate(\"CAPITAL\").into_iter().segments().collect();\nassert_eq!(uppercase, vec![\"CAP\", \"I\", \"TAL\"]);\n```\n\n\n### Loading dictionaries at runtime\n\nThe current set of available dictionaries amounts to ~2.8MB of data. Although embedding them is an option, most applications should prefer to load individual dictionaries at runtime, like so:\n\n```rust\nlet path_to_dict = \"/path/to/en-us.bincode\";\nlet english_us = Standard::from_path(Language::EnglishUS, path_to_dict) ?;\n```\n\nOr to embed them individually with [`include_bytes!`](https://doc.rust-lang.org/std/macro.include_bytes.html):\n```rust\nlet bytes = include_bytes!(\"./relative_path_to_dict_from_source_file/de-1996.standard.bincode\");\nlet mut cursor = std::io::Cursor::new(bytes);\nlet german_de = Standard::any_from_reader(\u0026mut cursor)?;\n```\n\nDictionaries bundled with `hyphenation` can be retrieved from the build folder under `target`, and packaged with the final application as desired.\n\n```bash\n$ find target -name \"dictionaries\"\ntarget/debug/build/hyphenation-33034db3e3b5f3ce/out/dictionaries\n```\n\n\n### Segmentation\n\nDictionaries can be used in conjunction with text segmentation to hyphenate words within a text run. This short example uses the [`unicode-segmentation`](https://crates.io/crates/unicode-segmentation) crate for untailored Unicode segmentation.\n\n```rust\nuse unicode_segmentation::UnicodeSegmentation;\n\nlet hyphenate_text = |text : \u0026str| -\u003e String {\n    // Split the text on word boundaries—\n    text.split_word_bounds()\n        // —and hyphenate each word individually.\n        .flat_map(|word| en_us.hyphenate(word).into_iter())\n        .collect()\n};\n\nlet excerpt = \"I know noble accents / And lucid, inescapable rhythms; […]\";\nassert_eq!(\"I know no-ble ac-cents / And lu-cid, in-escapable rhythms; […]\"\n          , hyphenate_text(excerpt));\n```\n\n\n### Normalization\n\nHyphenation patterns for languages affected by normalization occasionally cover multiple forms, at the discretion of their authors, but most often they don’t. If you require `hyphenation` to operate strictly on strings in a known normalization form, as described by the [Unicode Standard Annex #15](http://unicode.org/reports/tr15/) and provided by the [`unicode-normalization`](https://github.com/unicode-rs/unicode-normalization) crate, you may specify it in your Cargo manifest, like so:\n\n```toml\n[dependencies.hyphenation]\nversion = \"0.8.3\"\nfeatures = [\"nfc\"]\n```\n\nThe `features` field may contain exactly *one* of the following normalization options:\n\n- `\"nfc\"`, for canonical composition;\n- `\"nfd\"`, for canonical decomposition;\n- `\"nfkc\"`, for compatibility composition;\n- `\"nfkd\"`, for compatibility decomposition.\n\nYou may prefer to build `hyphenation` in release mode if normalization is enabled, since the bundled hyphenation patterns will need to be reprocessed into dictionaries.\n\n\n## License\n\n`hyphenation` © 2016 tapeinosyne, dual-licensed under the terms of either:\n  - the Apache License, Version 2.0\n  - the MIT license\n\n`hyph-utf8` hyphenation patterns © their respective owners; see their [master files](https://github.com/hyphenation/tex-hyphen/tree/49706f9cfa97f6ead26b473ec10d23d5a651318a/hyph-utf8/tex/generic/hyph-utf8/patterns/tex) for licensing information.\n\n`patterns/hyph-hu.ext.txt` (extended Hungarian hyphenation patterns) is licensed under:\n- MPL 1.1 (refer to `patterns/hyph-hu.ext.lic.txt`)\n\n`patterns/hyph-ca.ext.txt` (extended Catalan hyphenation patterns) is licensed under:\n- LGPL v.3.0 or higher (refer to `patterns/hyph-ca.ext.lic.txt`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftapeinosyne%2Fhyphenation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftapeinosyne%2Fhyphenation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftapeinosyne%2Fhyphenation/lists"}