https://github.com/sam0x17/safe-string
Provides a safe interface for working with multi-byte UTF-8 strings in Rust
https://github.com/sam0x17/safe-string
Last synced: 6 months ago
JSON representation
Provides a safe interface for working with multi-byte UTF-8 strings in Rust
- Host: GitHub
- URL: https://github.com/sam0x17/safe-string
- Owner: sam0x17
- License: mit
- Created: 2024-04-02T17:54:20.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-21T06:14:05.000Z (over 1 year ago)
- Last Synced: 2025-02-03T13:40:47.190Z (11 months ago)
- Language: Rust
- Size: 30.3 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# safe-string
[](https://crates.io/crates/safe-string)
[](https://docs.rs/safe-string/latest/safe-string/)
[](https://github.com/sam0x17/safe-string/actions/workflows/ci.yaml?query=branch%3Amain)
[](https://github.com/sam0x17/safe-string/blob/main/LICENSE)
This crate provides replacement types for `String` and `&str` that allow for safe
indexing by character to avoid panics and the usual pitfalls of working with multi-byte UTF-8
characters, namely the scenario where the _byte length_ of a string and the _character length_
of that same string are not the same.
Specifically, `IndexedString` (replaces `String`) and `IndexedSlice` (replaces `&str`) allow
for O(1) slicing and indexing by character, and they will never panic when indexing or slicing.
This is accomplished by storing the character offsets of each character in the string, along
with the original `String`, and using this information to calculate the byte offsets of each
character on the fly. Thus `IndexedString` uses ~2x the memory of a normal `String`, but
`IndexedSlice` and other types implementing `IndexedStr` have only one `usize` extra in
overhead over that of a regular `&str` slice / fat pointer. In theory this could be reduced
down to the same size as a fat pointer using unsafe rust, but this way we get to have
completely safe code and the difference is negligible.
## Examples
```rust
use safe_string::{IndexedString, IndexedStr, IndexedSlice};
let message = IndexedString::from("Hello, δΈη! ππ");
assert_eq!(message.as_str(), "Hello, δΈη! ππ");
assert_eq!(message, "Hello, δΈη! ππ"); // handy PartialEq impls
// Access characters by index
assert_eq!(message.char_at(7), Some('δΈ'));
assert_eq!(message.char_at(100), None); // Out of bounds access returns None
// Slice the IndexedString
let slice = message.slice(7..9);
assert_eq!(slice.as_str(), "δΈη");
// Convert slice back to IndexedString
let sliced_message = slice.to_indexed_string();
assert_eq!(sliced_message.as_str(), "δΈη");
// Nested slicing
let slice = message.slice(0..10);
let nested_slice = slice.slice(3..6);
assert_eq!(nested_slice.as_str(), "lo,");
// Display byte length and character length
assert_eq!(IndexedString::from_str("δΈη").byte_len(), 6); // "δΈη" is 6 bytes in UTF-8
assert_eq!(IndexedString::from_str("δΈη").len(), 2); // "δΈη" has 2 characters
// Demonstrate clamped slicing (no panic)
let clamped_slice = message.slice(20..30);
assert_eq!(clamped_slice.as_str(), "");
// Using `as_str` to interface with standard Rust string handling
let slice = message.slice(0..5);
let standard_str_slice = slice.as_str();
assert_eq!(standard_str_slice, "Hello");
```