https://github.com/ryanfowler/rfc9839-rs
Validation of RFC 9839 Unicode subsets in Rust.
https://github.com/ryanfowler/rfc9839-rs
Last synced: 22 days ago
JSON representation
Validation of RFC 9839 Unicode subsets in Rust.
- Host: GitHub
- URL: https://github.com/ryanfowler/rfc9839-rs
- Owner: ryanfowler
- License: apache-2.0
- Created: 2025-08-23T15:30:28.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-09-09T23:10:45.000Z (8 months ago)
- Last Synced: 2025-12-29T17:43:35.529Z (4 months ago)
- Language: Rust
- Homepage:
- Size: 13.7 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# rfc9839
[](https://crates.io/crates/rfc9839)
[](https://docs.rs/rfc9839)
[](#license)
Validation of [RFC 9839](https://www.rfc-editor.org/rfc/rfc9839) Unicode subsets in Rust.
RFC 9839 defines three nested subsets of Unicode characters for use in text protocols:
- **Unicode Scalars** – all code points except UTF-16 surrogates.
*Every Rust `char` is already a scalar value; checks are included for completeness and for raw byte validation.*
- **XML Characters** – `{ TAB, LF, CR } ∪ [0x20–0xD7FF] ∪ [0xE000–0xFFFD] ∪ [0x10000–0x10FFFF]`.
*This is the XML “Char” production with legacy controls and noncharacters excluded.*
- **Unicode Assignables** – “not problematic” characters: useful controls, printable ASCII (excluding DEL/C1),
and all assigned scalars minus standardized noncharacters (…FFFE/FFFF in each plane and U+FDD0–FDEF).
---
## Features
- **Character-level APIs**: `is_unicode_scalar_char`, `is_xml_char`, `is_unicode_assignable_char`
- **String-level APIs**: `is_unicode_scalar`, `is_xml_chars`, `is_unicode_assignable`
- **Byte-level APIs**: `is_unicode_scalar_bytes`, `is_xml_chars_bytes`, `is_unicode_assignable_bytes`
- **ASCII fast-path**: tight loops for ASCII data, falling back to `chars()` only after the first non-ASCII byte
- **Zero allocations**, no lookup tables
## Example
```rust
use rfc9839::*;
// Scalars (always true for safe Rust strings)
assert!(is_unicode_scalar("hello 🌍"));
// XML Characters
assert!(is_xml_chars("ok\tline\n"));
assert!(!is_xml_chars("\u{0000}")); // NUL is disallowed
// Unicode Assignables
assert!(is_unicode_assignable("emoji 👍"));
assert!(!is_unicode_assignable("\u{007F}")); // DEL is excluded
```