https://github.com/stranger6667/unicode-intervals
Search for Unicode code points intervals by including/excluding categories, ranges, and custom characters sets.
https://github.com/stranger6667/unicode-intervals
Last synced: 4 months ago
JSON representation
Search for Unicode code points intervals by including/excluding categories, ranges, and custom characters sets.
- Host: GitHub
- URL: https://github.com/stranger6667/unicode-intervals
- Owner: Stranger6667
- License: apache-2.0
- Created: 2023-04-20T19:23:04.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-04-27T12:04:34.000Z (about 3 years ago)
- Last Synced: 2024-10-19T10:44:50.373Z (over 1 year ago)
- Language: Rust
- Size: 135 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE-APACHE
Awesome Lists containing this project
README
unicode-intervals
=================
[
](https://github.com/Stranger6667/unicode-intervals)
[
](https://crates.io/crates/unicode-intervals)
[
](https://docs.rs/unicode-intervals)
[
](https://github.com/Stranger6667/unicode-intervals/actions?query=branch%3Amain)
[
](https://app.codecov.io/github/Stranger6667/unicode-intervals)
This library provides a way to search for Unicode code point intervals by categories, ranges, and custom character sets.
The main purpose of `unicode-intervals` is to simplify generating strings that matching specific criteria.
```toml
[dependencies]
unicode-intervals = "0.2"
```
## Examples
The example below will produce code point intervals of uppercase & lowercase letters less than 128 and will include the `☃` character.
```rust
use unicode_intervals::UnicodeCategory;
let intervals = unicode_intervals::query()
.include_categories(
UnicodeCategory::UPPERCASE_LETTER |
UnicodeCategory::LOWERCASE_LETTER
)
.max_codepoint(128)
.include_characters("☃")
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(65, 90), (97, 122), (9731, 9731)]);
```
`IntervalSet` for index-like access to the underlying codepoints:
```rust
let interval_set = unicode_intervals::query()
.max_codepoint(128)
.interval_set()
.expect("Invalid query input");
// Get 10th codepoint in this interval set
assert_eq!(interval_set.codepoint_at(10), Some('K' as u32));
assert_eq!(interval_set.index_of('K'), Some(10));
```
Query specific Unicode version:
```rust
use unicode_intervals::UnicodeVersion;
let intervals = UnicodeVersion::V11_0_0.query()
.max_codepoint(128)
.include_characters("☃")
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(0, 128), (9731, 9731)]);
```
Restrict the output to code points within a certain range:
```rust
let intervals = unicode_intervals::query()
.min_codepoint(65)
.max_codepoint(128)
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(65, 128)])
```
Include or exclude specific characters:
```rust
use unicode_intervals::UnicodeCategory;
let intervals = unicode_intervals::query()
.include_categories(UnicodeCategory::PARAGRAPH_SEPARATOR)
.include_characters("-123")
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(45, 45), (49, 51), (8233, 8233)])
```
## Unicode version support
`unicode-intervals` supports Unicode 9.0.0 - 15.0.0.
#### License
Licensed under either of Apache License, Version
2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
be dual licensed as above, without any additional terms or conditions.