Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lifthrasiir/rust-maybe_utf8

Byte container optionally encoded as UTF-8
https://github.com/lifthrasiir/rust-maybe_utf8

Last synced: about 2 months ago
JSON representation

Byte container optionally encoded as UTF-8

Awesome Lists containing this project

README

        

[MaybeUtf8][doc] 0.2.3
======================

[![MaybeUTF8 on Travis CI][travis-image]][travis]

[travis-image]: https://travis-ci.org/lifthrasiir/rust-maybe_utf8.png
[travis]: https://travis-ci.org/lifthrasiir/rust-maybe_utf8

Byte container optionally encoded as UTF-8.
It is intended as a byte sequence type with uncertain character encoding,
while the caller might be able to determine the actual encoding.

For example, [ZIP file format](https://en.wikipedia.org/wiki/Zip_%28file_format%29)
originally didn't support UTF-8 file names,
assuming the archive would be extracted only in the system
with the same system encoding as the original system.
The newer ZIP standard supports explicitly UTF-8-encoded file names though.
In this case, the ZIP library may want to return either a `String` or `Vec`
depending on the UTF-8 flag.

This crate supports two types,
`MaybeUtf8Buf` (analogous to `String`) and `MaybeUtf8Slice` (analogous to `&str`).
Both types support various conversion methods.
For example, if you know that the bytes are encoded in ISO 8859-2,
[Encoding](https://github.com/lifthrasiir/rust-encoding/) can be used to convert them:

```rust
use std::borrow::IntoCow;
use encoding::{Encoding, DecoderTrap};
use encoding::all::ISO_8859_2;
use maybe_utf8::{MaybeUtf8Buf, MaybeUtf8Slice};

let namebuf = MaybeUtf8Buf::from_bytes(vec![99,97,102,233]);
assert_eq!(format!("{}", namebuf), "caf\u{fffd}");

// borrowed slice equally works
{
let nameslice: MaybeUtf8Slice = namebuf.to_slice();
assert_eq!(format!("{:?}", nameslice), r#"b"caf\xe9""#);
assert_eq!(nameslice.map_as_cow(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
"caf\u{e9}");
}

// consuming an optionally-UTF-8-encoded buffer also works
assert_eq!(namebuf.map_into_str(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
"caf\u{e9}");
```

`IntoMaybeUtf8` trait can be used to uniformly accept either string or vector
to construct `MaybeUtf8*` values.

```rust
use maybe_utf8::IntoMaybeUtf8;
assert_eq!("caf\u{e9}".into_maybe_utf8(), b"caf\xc3\xa9".into_maybe_utf8());
```

[Complete Documentation][doc] is available.

MaybeUtf8 is written by Kang Seonghoon and licensed under the MIT/X11 license.

[doc]: https://lifthrasiir.github.io/rust-maybe_utf8/