Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vallentin/char-ranges
https://github.com/vallentin/char-ranges
Last synced: 24 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/vallentin/char-ranges
- Owner: vallentin
- License: mit
- Created: 2023-06-04T13:52:47.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-04-16T04:03:18.000Z (7 months ago)
- Last Synced: 2024-08-25T09:15:49.453Z (2 months ago)
- Language: Rust
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# char-ranges
[![CI](https://github.com/vallentin/char-ranges/workflows/CI/badge.svg)](https://github.com/vallentin/char-ranges/actions?query=workflow%3ACI)
[![Latest Version](https://img.shields.io/crates/v/char-ranges.svg)](https://crates.io/crates/char-ranges)
[![Docs](https://docs.rs/char-ranges/badge.svg)](https://docs.rs/char-ranges)
[![License](https://img.shields.io/github/license/vallentin/char-ranges.svg)](https://github.com/vallentin/char-ranges)Similar to the standard library's [`.char_indicies()`], but instead of only
producing the start byte position. This library implements [`.char_ranges()`],
that produce both the start and end byte positions.Note that simply using [`.char_indicies()`] and creating a range by mapping the
returned index `i` to `i..(i + 1)` is not guaranteed to be valid. Given that
some UTF-8 characters can be up to 4 bytes.| Char | Bytes | Range |
| :---: | :---: | :----: |
| `'O'` | 1 | `0..1` |
| `'Ø'` | 2 | `0..2` |
| `'∈'` | 3 | `0..3` |
| `'🌏'` | 4 | `0..4` |_Assumes encoded in UTF-8._
The implementation specializes [`last()`], [`nth()`], [`next_back()`],
and [`nth_back()`]. Such that the length of intermediate characters is
not wastefully calculated.## Example
```rust
use char_ranges::CharRangesExt;let text = "Hello 🗻∈🌏";
let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "Hello 🗻∈🌏");assert_eq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((1..2, 'e')));
assert_eq!(chars.next(), Some((2..3, 'l')));
assert_eq!(chars.next(), Some((3..4, 'l')));
assert_eq!(chars.next(), Some((4..5, 'o')));
assert_eq!(chars.next(), Some((5..6, ' ')));// Get the remaining substring
assert_eq!(chars.as_str(), "🗻∈🌏");assert_eq!(chars.next(), Some((6..10, '🗻'))); // This char is 4 bytes
assert_eq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes
assert_eq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes
assert_eq!(chars.next(), None);
```## `DoubleEndedIterator`
[`CharRanges`] also implements [`DoubleEndedIterator`] making it possible to iterate backwards.
```rust
use char_ranges::CharRangesExt;let text = "ABCDE";
let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "ABCDE");assert_eq!(chars.next(), Some((0..1, 'A')));
assert_eq!(chars.next_back(), Some((4..5, 'E')));
assert_eq!(chars.as_str(), "BCD");assert_eq!(chars.next_back(), Some((3..4, 'D')));
assert_eq!(chars.next(), Some((1..2, 'B')));
assert_eq!(chars.as_str(), "C");assert_eq!(chars.next(), Some((2..3, 'C')));
assert_eq!(chars.as_str(), "");assert_eq!(chars.next(), None);
```## Offset Ranges
If the input `text` is a substring of some original text, and the produced
ranges are desired to be offset in relation to the substring. Then instead
of [`.char_ranges()`] use[.char_ranges_offset]\(offset)
or.[char_ranges]\().[offset]\(offset)
.```rust
use char_ranges::CharRangesExt;let text = "Hello 👋 World 🌏";
let start = 11; // Start index of 'W'
let text = &text[start..]; // "World 🌏"let mut chars = text.char_ranges_offset(start);
// or
// let mut chars = text.char_ranges().offset(start);assert_eq!(chars.next(), Some((11..12, 'W'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((12..13, 'o')));
assert_eq!(chars.next(), Some((13..14, 'r')));assert_eq!(chars.next_back(), Some((17..21, '🌏'))); // This char is 4 bytes
```[`.char_ranges()`]: https://docs.rs/char-ranges/*/char_ranges/trait.CharRangesExt.html#tymethod.char_ranges
[char_ranges]: https://docs.rs/char-ranges/*/char_ranges/trait.CharRangesExt.html#tymethod.char_ranges
[char_ranges()]: https://docs.rs/char-ranges/*/char_ranges/trait.CharRangesExt.html#tymethod.char_ranges
[.char_ranges_offset]: https://docs.rs/char-ranges/*/char_ranges/trait.CharRangesExt.html#tymethod.char_ranges_offset
[offset]: https://docs.rs/char-ranges/0.1.0/char_ranges/struct.CharRanges.html#method.offset
[`CharRanges`]: https://docs.rs/char-ranges/*/char_ranges/struct.CharRanges.html[`.char_indicies()`]: https://doc.rust-lang.org/std/primitive.str.html#method.char_indices
[`DoubleEndedIterator`]: https://doc.rust-lang.org/std/iter/trait.DoubleEndedIterator.html[`last()`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.last
[`nth()`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.nth
[`next_back()`]: https://doc.rust-lang.org/std/iter/trait.DoubleEndedIterator.html#tymethod.next_back
[`nth_back()`]: https://doc.rust-lang.org/std/iter/trait.DoubleEndedIterator.html#method.nth_back