https://github.com/bbqsrc/stringly
A Rust library providing generic string types parameterized by encoding.
https://github.com/bbqsrc/stringly
cesu8 codepages encoding gb18030 strings utf-16 utf-32 utf-8 utf-ebcdic utf7
Last synced: about 2 months ago
JSON representation
A Rust library providing generic string types parameterized by encoding.
- Host: GitHub
- URL: https://github.com/bbqsrc/stringly
- Owner: bbqsrc
- Created: 2025-12-17T17:45:34.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-12-31T19:41:21.000Z (5 months ago)
- Last Synced: 2026-02-02T07:50:54.189Z (4 months ago)
- Topics: cesu8, codepages, encoding, gb18030, strings, utf-16, utf-32, utf-8, utf-ebcdic, utf7
- Language: Rust
- Homepage:
- Size: 1.6 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# stringly
A Rust library providing generic string types parameterized by encoding.
## Overview
`stringly` provides `String` and `Str` types that work like `std`'s `String` and `str`, but are generic over their encoding. This allows you to:
- Write encoding-agnostic code that works with UTF-8, UTF-16, UTF-32, and legacy encodings
- Safely transcode between encodings
- Use the same API you already know from `std`
## Features
- **Generic string types**: `String` (owned) and `Str` (borrowed) parameterized by encoding
- **Full `std` API compatibility**: The same methods as `std::string::String` and `str`
- **Pattern matching**: Generic `Pattern` trait for searching, splitting, and matching
- **Transcoding**: Convert between any encodings via `chars()` iterators
- **Feature-gated encodings**: Enable only the encodings you need
## Supported Encodings
CESU-8, GB18030, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, UTF-7, UTF-7-IMAP, UTF-8, UTF-EBCDIC, ATARIST, Big5, CP037, CP1006, CP1026, CP424, CP437, CP500, CP737, CP775, CP850, CP852, CP855, CP856, CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP875, EUC-JP, EUC-KR, ISO-2022-JP, ISO-8859-1, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, KOI8-R, KOI8-U, KZ1048, MacArabic, MacCeltic, MacCenteuro, MacChinsimp, MacChintrad, MacCroatian, MacCyrillic, MacDevanaga, MacDingbats, MacFarsi, MacGaelic, MacGreek, MacGujarati, MacGurmukhi, MacHebrew, MacIceland, MacInuit, MacJapanese, MacKeyboard, MacKorean, MacRoman, MacRomanian, MacSymbol, MacThai, MacTurkish, Shift_JIS, Windows-1250, Windows-1251, Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, Windows-1258, Windows-874, Windows-932, Windows-936, Windows-949, Windows-950
## Usage
Add to your `Cargo.toml`:
```toml
[dependencies]
stringly = "0.0.0"
# Enable specific encodings
[features]
default = ["stringly/unicode"] # UTF-16, UTF-32, GB18030, UTF-7, CESU-8, UTF-EBCDIC
```
### Basic Example
```rust
use stringly::{String, Utf8};
// Create a UTF-8 string - same API as std
let s: String = String::from("Hello, world!");
assert!(s.starts_with("Hello"));
```
### Encoding-Generic Functions
```rust
use stringly::{Str, Encoding};
// Write functions that work with any encoding
fn word_count(s: &Str) -> usize {
s.split_whitespace().count()
}
```
### Transcoding
```rust
use stringly::{String, Utf8, Utf16Le, Iso8859_1};
// Universal encodings can convert infallibly via From
let utf8: String = String::from("Hello");
let utf16: String = String::from(utf8);
// Limited encodings may fail if characters aren't representable
let utf8: String = String::from("Hello");
let latin1: Option> = utf8.try_transcode();
```
## Feature Flags
| Feature | Description |
|---------|-------------|
| `unicode` | **Default.** Enables `utf16`, `utf32`, `gb18030`, `utf7`, `cesu8`, `utf-ebcdic` |
| `utf16` | UTF-16 LE/BE |
| `utf32` | UTF-32 LE/BE |
| `gb18030` | GB18030 (Chinese national standard) |
| `utf7` | UTF-7 and UTF-7 IMAP |
| `cesu8` | CESU-8 (compatibility encoding for UTF-16) |
| `utf-ebcdic` | UTF-EBCDIC (Unicode on EBCDIC systems) |
| `codepages` | All legacy codepages |
| `codepages-iso8859` | ISO-8859-* family |
| `codepages-windows` | Windows-* codepages |
| `codepages-dos` | DOS codepages (CP437, etc.) |
| `codepages-apple` | Apple codepages |
| `codepages-misc` | Miscellaneous codepages |
| `codepages-cjk` | Enables `euc-jp`, `iso-2022-jp`, `shift-jis`, `euc-kr`, `big5` |
| `euc-jp` | EUC-JP |
| `iso-2022-jp` | ISO-2022-JP (stateful) |
| `shift-jis` | Shift_JIS |
| `euc-kr` | EUC-KR |
| `big5` | Big5 |
| `registry` | Runtime encoding registry for CLI tools |
## Encoding Traits
The library defines three levels of encoding capability:
- **`Encoding`**: Base trait for all encodings. Provides character iteration, validation, and basic operations.
- **`LimitedEncoding`**: Encodings that can represent a subset of Unicode (e.g., ISO-8859-1).
- **`UniversalEncoding`**: Encodings that can represent all of Unicode (UTF-8, UTF-16, UTF-32, GB18030, UTF-7, CESU-8, UTF-EBCDIC).
## License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.