{"id":13672620,"url":"https://github.com/open-i18n/rust-unic","last_synced_at":"2025-12-12T13:35:56.658Z","repository":{"id":33946283,"uuid":"94862886","full_name":"open-i18n/rust-unic","owner":"open-i18n","description":"UNIC: Unicode and Internationalization Crates for Rust","archived":false,"fork":false,"pushed_at":"2023-08-26T10:51:28.000Z","size":14886,"stargazers_count":234,"open_issues_count":63,"forks_count":24,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-05-02T02:07:14.202Z","etag":null,"topics":["cldr","crates","internationalization","locale-data","rust","text-processing","unic","unicode","unicode-algorithms","unicode-characters"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/unic","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/open-i18n.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-APACHE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS"}},"created_at":"2017-06-20T07:33:45.000Z","updated_at":"2024-04-15T16:23:29.000Z","dependencies_parsed_at":"2024-01-06T07:50:08.135Z","dependency_job_id":"7a69033f-461e-49e5-9396-5d475a896c53","html_url":"https://github.com/open-i18n/rust-unic","commit_stats":{"total_commits":614,"total_committers":8,"mean_commits":76.75,"dds":"0.40065146579804556","last_synced_commit":"41cd2b01e2e51a2b8d7d32fd13b14c0c092ac014"},"previous_names":["behnam/rust-unic"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-i18n%2Frust-unic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-i18n%2Frust-unic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-i18n%2Frust-unic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-i18n%2Frust-unic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/open-i18n","download_url":"https://codeload.github.com/open-i18n/rust-unic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247730069,"owners_count":20986404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cldr","crates","internationalization","locale-data","rust","text-processing","unic","unicode","unicode-algorithms","unicode-characters"],"created_at":"2024-08-02T09:01:41.966Z","updated_at":"2025-12-12T13:35:51.597Z","avatar_url":"https://github.com/open-i18n.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# UNIC: Unicode and Internationalization Crates for Rust\n\n[![UNIC-logo](docs/images/UNIC-logo.png)](https://github.com/open-i18n/rust-unic/)\n\n[![Travis](https://img.shields.io/travis/open-i18n/rust-unic/master.svg?label=Linux%20build)](https://travis-ci.org/open-i18n/rust-unic/)\n[![Rust-1.45+](https://img.shields.io/badge/rustc-1.45+-red.svg#MIN_RUST_VERSION)](https://www.rust-lang.org/)\n[![Unicode-10.0.0](https://img.shields.io/badge/unicode-10.0.0-red.svg)](https://www.unicode.org/versions/Unicode10.0.0/)\n[![Release](https://img.shields.io/github/release/open-i18n/rust-unic.svg)](https://github.com/open-i18n/rust-unic/)\n[![Crates.io](https://img.shields.io/crates/v/unic.svg)](https://crates.io/crates/unic/)\n[![Documentation](https://docs.rs/unic/badge.svg)](https://docs.rs/unic/)\n[![Gitter](https://img.shields.io/gitter/room/open-i18n/rust-unic.svg)](https://gitter.im/open-i18n/rust-unic)\n\n\u003chttps://github.com/open-i18n/rust-unic\u003e\n\n**UNIC** is a project to develop components for the Rust programming language\nto provide high-quality and easy-to-use crates for Unicode\nand Internationalization data and algorithms. In other words, it's like\n[ICU](http://site.icu-project.org/) for Rust, written completely in Rust, mostly\nin *safe* mode, but also benefiting from performance gains of *unsafe* mode when\npossible.\n\nSee [UNIC Changelog](CHANGELOG.md) for latest release details.\n\n\n## Project Goal\n\nThe goal for UNIC is to provide access to all levels of Unicode and\nInternationalization functionalities, starting from Unicode character\nproperties, to Unicode algorithms for processing text, and more advanced\n(locale-based) processes based on Unicode Common Locale Data Repository (CLDR).\n\nOther standards and best practices, like IETF RFCs, are also implemented, as\nneeded by Unicode/CLDR components, or common demand.\n\n\n## Project Status\n\nAt the moment UNIC is under heavy development: the API is updated frequently on\n`master` branch, and there will be API breakage between each `0.x` release.\nPlease see [open issues](https://github.com/open-i18n/rust-unic/issues) for changes\nplaned.\n\nWe expect to have the `1.0` version released in 2018 and maintain a stable API\nafterwards, with possibly one or two API updates per year for the first couple\nof years.\n\n\n## Design Goals\n\n1.  Primary goal of UNIC is to provide reliable functionality by way of\n    easy-to-use API. Therefore, new components are added may not be\n    well-optimized for performance, but will have enough tests to show\n    conformance to the standard, and examples to show users how they can be\n    used to address common needs.\n\n2.  Next major goal for UNIC components is performance and low binary and memory\n    footprints. Specially, optimizing runtime for ASCII and other common cases\n    will encourage adaptation without fear of slowing down regular development\n    processes.\n\n3.  Components are guaranteed, to the extend possible, to provide consistent\n    data and algorithms. Cross-component tests are used to catch any\n    inconsistency between implementations, without slowing down development\n    processes.\n\n\n## Components and their Organization\n\nUNIC *Components* have a hierarchical organization, starting from the\n[`unic`](unic/) root, containing the *major components*. Each major component, in\nturn, may host some *minor components*.\n\nAPI of major components are designed for the end-users of the libraries, and\nare expected to be extensively documented and accompanies with code examples.\n\nIn contrast to major components, minor components act as providers of data and\nalgorithms for the higher-level, and their API is expected to be more\nperforming, and possibly providing multiple ways of accessing the data.\n\n### The UNIC Super-Crate\n\nThe [`unic`](https://crates.io/crates/unic) super-crate is a collection of all\nUNIC (major) components, providing an easy way of access to all functionalities,\nwhen all or many are needed, instead of importing components one-by-one. This\ncrate ensures all components imported are compatible in algorithms and\nconsistent data-wise.\n\nMain code examples and cross-component integration tests are implemented under\nthis crate.\n\n### Major Components\n\n-   [`unic-char`](unic/char/): Unicode Character Tools.\n    [![Crates.io](https://img.shields.io/crates/v/unic-char.svg)](https://crates.io/crates/unic-char/)\n\n-   [`unic-ucd`](unic/ucd/): Unicode Character Database\n    ([UAX\\#44](https://unicode.org/reports/tr44/)).\n    [![Crates.io](https://img.shields.io/crates/v/unic-ucd.svg)](https://crates.io/crates/unic-ucd/)\n\n-   [`unic-bidi`](unic/bidi/): Unicode Bidirectional Algorithm\n    ([UAX\\#9](https://unicode.org/reports/tr9/)).\n    [![Crates.io](https://img.shields.io/crates/v/unic-bidi.svg)](https://crates.io/crates/unic-bidi/)\n\n-   [`unic-normal`](unic/normal/): Unicode Normalization Forms\n    ([UAX\\#15](https://unicode.org/reports/tr15/)).\n    [![Crates.io](https://img.shields.io/crates/v/unic-normal.svg)](https://crates.io/crates/unic-normal/)\n\n-   [`unic-segment`](unic/segment/): Unicode Text Segmentation Algorithms\n    ([UAX\\#29](https://unicode.org/reports/tr29/)).\n    [![Crates.io](https://img.shields.io/crates/v/unic-segment.svg)](https://crates.io/crates/unic-segment/)\n\n-   [`unic-idna`](unic/idna/): Unicode IDNA Compatibility Processing\n    ([UTS\\#46](https://unicode.org/reports/tr46/)).\n    [![Crates.io](https://img.shields.io/crates/v/unic-idna.svg)](https://crates.io/crates/unic-idna/)\n\n-   [`unic-emoji`](unic/emoji/): Unicode Emoji\n    ([UTS\\#51](https://unicode.org/reports/tr51/)).\n    [![Crates.io](https://img.shields.io/crates/v/unic-emoji.svg)](https://crates.io/crates/unic-emoji/)\n\n### Applications\n\n-   [`unic-cli`](apps/cli): UNIC Command-Line Tools\n    [![Crates.io](https://img.shields.io/crates/v/unic-cli.svg)](https://crates.io/crates/unic-cli/)\n\n\n## Code Organization: Combined Repository\n\nSome of the reasons to have a combined repository these components are:\n\n*   **Faster development**. Implementing new Unicode/i18n components very often\n    depends on other (lower level) components, which in turn may need\n    adjustments—expose new API, fix bugs, etc—that can be developed, tested and\n    reviewed in less cycles and shorter times.\n\n*   **Implementation Integrity**. Multiple dependencies on other components\n    mean that the components need to, to some level, agree with each other.\n    Many Unicode algorithms, composed from smaller ones, assume that all parts\n    of the algorithm is using the same version of Unicode data. Violation of\n    this assumption can cause inconsistencies and hard-to-catch bugs. In a\n    combined repository, it's possible to reach a better integrity during\n    development, as well as with cross-component (integration) tests.\n\n*   **Pay for what you need.** Small components (basic crates), which\n    cross-depend only on what they need, allow users to only bring in what they\n    consume in their project.\n\n*   **Shared bootstrapping.** Considerable amount of extending Unicode/i18n\n    functionalities depends on converting source Unicode/locale data into\n    structured formats for the destination programming language. In a combined\n    repository, it's easier to maintain these bootstrapping tools, expand\n    coverage, and use better data structures for more efficiency.\n\n\n## Documentation\n\n* [Unicode and Rust](docs/Unicode_and_Rust.md)\n* [UNIC Versioning](docs/Versioning.md)\n* [UNIC Unicode API](docs/Unicode_API.md)\n* [UNIC API Guideline](docs/API_Guideline.md)\n* [UNIC API Reference](https://docs.rs/unic/) (autogenerated on *docs.rs*)\n\n\n## How to Use UNIC\n\nIn `Cargo.toml`:\n\n```toml\n[dependencies]\nunic = \"0.9.0\"  # This has Unicode 10.0.0 data and algorithms\n```\n\nAnd in `main.rs`:\n\n```rust\nextern crate unic;\n\nuse unic::ucd::common::is_alphanumeric;\nuse unic::bidi::BidiInfo;\nuse unic::normal::StrNormalForm;\nuse unic::segment::{GraphemeIndices, Graphemes, WordBoundIndices, WordBounds, Words};\nuse unic::ucd::normal::compose;\nuse unic::ucd::{is_cased, Age, BidiClass, CharAge, CharBidiClass, StrBidiClass, UnicodeVersion};\n\nfn main() {\n\n    // Age\n\n    assert_eq!(Age::of('A').unwrap().actual(), UnicodeVersion { major: 1, minor: 1, micro: 0 });\n    assert_eq!(Age::of('\\u{A0000}'), None);\n    assert_eq!(\n        Age::of('\\u{10FFFF}').unwrap().actual(),\n        UnicodeVersion { major: 2, minor: 0, micro: 0 }\n    );\n\n    if let Some(age) = '🦊'.age() {\n        assert_eq!(age.actual().major, 9);\n        assert_eq!(age.actual().minor, 0);\n        assert_eq!(age.actual().micro, 0);\n    }\n\n    // Bidi\n\n    let text = concat![\n        \"א\",\n        \"ב\",\n        \"ג\",\n        \"a\",\n        \"b\",\n        \"c\",\n    ];\n\n    assert!(!text.has_bidi_explicit());\n    assert!(text.has_rtl());\n    assert!(text.has_ltr());\n\n    assert_eq!(text.chars().nth(0).unwrap().bidi_class(), BidiClass::RightToLeft);\n    assert!(!text.chars().nth(0).unwrap().is_ltr());\n    assert!(text.chars().nth(0).unwrap().is_rtl());\n\n    assert_eq!(text.chars().nth(3).unwrap().bidi_class(), BidiClass::LeftToRight);\n    assert!(text.chars().nth(3).unwrap().is_ltr());\n    assert!(!text.chars().nth(3).unwrap().is_rtl());\n\n    let bidi_info = BidiInfo::new(text, None);\n    assert_eq!(bidi_info.paragraphs.len(), 1);\n\n    let para = \u0026bidi_info.paragraphs[0];\n    assert_eq!(para.level.number(), 1);\n    assert_eq!(para.level.is_rtl(), true);\n\n    let line = para.range.clone();\n    let display = bidi_info.reorder_line(para, line);\n    assert_eq!(\n        display,\n        concat![\n            \"a\",\n            \"b\",\n            \"c\",\n            \"ג\",\n            \"ב\",\n            \"א\",\n        ]\n    );\n\n    // Case\n\n    assert_eq!(is_cased('A'), true);\n    assert_eq!(is_cased('א'), false);\n\n    // Normalization\n\n    assert_eq!(compose('A', '\\u{030A}'), Some('Å'));\n\n    let s = \"ÅΩ\";\n    let c = s.nfc().collect::\u003cString\u003e();\n    assert_eq!(c, \"ÅΩ\");\n\n    // Segmentation\n\n    assert_eq!(\n        Graphemes::new(\"a\\u{310}e\\u{301}o\\u{308}\\u{332}\").collect::\u003cVec\u003c\u0026str\u003e\u003e(),\n        \u0026[\"a\\u{310}\", \"e\\u{301}\", \"o\\u{308}\\u{332}\"]\n    );\n\n    assert_eq!(\n        Graphemes::new(\"a\\r\\nb🇺🇳🇮🇨\").collect::\u003cVec\u003c\u0026str\u003e\u003e(),\n        \u0026[\"a\", \"\\r\\n\", \"b\", \"🇺🇳\", \"🇮🇨\"]\n    );\n\n    assert_eq!(\n        GraphemeIndices::new(\"a̐éö̲\\r\\n\").collect::\u003cVec\u003c(usize, \u0026str)\u003e\u003e(),\n        \u0026[(0, \"a̐\"), (3, \"é\"), (6, \"ö̲\"), (11, \"\\r\\n\")]\n    );\n\n    assert_eq!(\n        Words::new(\n            \"The quick (\\\"brown\\\") fox can't jump 32.3 feet, right?\",\n            |s: \u0026\u0026str| s.chars().any(is_alphanumeric),\n        ).collect::\u003cVec\u003c\u0026str\u003e\u003e(),\n        \u0026[\"The\", \"quick\", \"brown\", \"fox\", \"can't\", \"jump\", \"32.3\", \"feet\", \"right\"]\n    );\n\n    assert_eq!(\n        WordBounds::new(\"The quick (\\\"brown\\\")  fox\").collect::\u003cVec\u003c\u0026str\u003e\u003e(),\n        \u0026[\"The\", \" \", \"quick\", \" \", \"(\", \"\\\"\", \"brown\", \"\\\"\", \")\", \" \", \" \", \"fox\"]\n    );\n\n    assert_eq!(\n        WordBoundIndices::new(\"Brr, it's 29.3°F!\").collect::\u003cVec\u003c(usize, \u0026str)\u003e\u003e(),\n        \u0026[\n            (0, \"Brr\"),\n            (3, \",\"),\n            (4, \" \"),\n            (5, \"it's\"),\n            (9, \" \"),\n            (10, \"29.3\"),\n            (14, \"°\"),\n            (16, \"F\"),\n            (17, \"!\")\n        ]\n    );\n}\n```\n\nYou can find more examples under [`examples`](examples/) and [`tests`](tests/)\ndirectories. (And more to be added as UNIC expands...)\n\n\n## License\n\nLicensed under either of\n\n * Apache License, Version 2.0\n   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n * MIT license\n   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n\n### Contribution\n\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in the work by you, as defined in the Apache-2.0 license, shall be\ndual licensed as above, without any additional terms or conditions.\n\n\n## Code of Conduct\n\nUNIC project follows **The Rust Code of Conduct**. You can find a copy of it in\n[CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) or online at\n\u003chttps://www.rust-lang.org/conduct.html\u003e.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-i18n%2Frust-unic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopen-i18n%2Frust-unic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-i18n%2Frust-unic/lists"}