Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lifthrasiir/rust-maybe_utf8
Byte container optionally encoded as UTF-8
https://github.com/lifthrasiir/rust-maybe_utf8
Last synced: about 2 months ago
JSON representation
Byte container optionally encoded as UTF-8
- Host: GitHub
- URL: https://github.com/lifthrasiir/rust-maybe_utf8
- Owner: lifthrasiir
- License: mit
- Created: 2015-01-14T17:04:15.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-02-21T12:04:40.000Z (almost 10 years ago)
- Last Synced: 2024-10-19T06:16:31.955Z (2 months ago)
- Language: Rust
- Size: 789 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
[MaybeUtf8][doc] 0.2.3
======================[![MaybeUTF8 on Travis CI][travis-image]][travis]
[travis-image]: https://travis-ci.org/lifthrasiir/rust-maybe_utf8.png
[travis]: https://travis-ci.org/lifthrasiir/rust-maybe_utf8Byte container optionally encoded as UTF-8.
It is intended as a byte sequence type with uncertain character encoding,
while the caller might be able to determine the actual encoding.For example, [ZIP file format](https://en.wikipedia.org/wiki/Zip_%28file_format%29)
originally didn't support UTF-8 file names,
assuming the archive would be extracted only in the system
with the same system encoding as the original system.
The newer ZIP standard supports explicitly UTF-8-encoded file names though.
In this case, the ZIP library may want to return either a `String` or `Vec`
depending on the UTF-8 flag.This crate supports two types,
`MaybeUtf8Buf` (analogous to `String`) and `MaybeUtf8Slice` (analogous to `&str`).
Both types support various conversion methods.
For example, if you know that the bytes are encoded in ISO 8859-2,
[Encoding](https://github.com/lifthrasiir/rust-encoding/) can be used to convert them:```rust
use std::borrow::IntoCow;
use encoding::{Encoding, DecoderTrap};
use encoding::all::ISO_8859_2;
use maybe_utf8::{MaybeUtf8Buf, MaybeUtf8Slice};let namebuf = MaybeUtf8Buf::from_bytes(vec![99,97,102,233]);
assert_eq!(format!("{}", namebuf), "caf\u{fffd}");// borrowed slice equally works
{
let nameslice: MaybeUtf8Slice = namebuf.to_slice();
assert_eq!(format!("{:?}", nameslice), r#"b"caf\xe9""#);
assert_eq!(nameslice.map_as_cow(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
"caf\u{e9}");
}// consuming an optionally-UTF-8-encoded buffer also works
assert_eq!(namebuf.map_into_str(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
"caf\u{e9}");
````IntoMaybeUtf8` trait can be used to uniformly accept either string or vector
to construct `MaybeUtf8*` values.```rust
use maybe_utf8::IntoMaybeUtf8;
assert_eq!("caf\u{e9}".into_maybe_utf8(), b"caf\xc3\xa9".into_maybe_utf8());
```[Complete Documentation][doc] is available.
MaybeUtf8 is written by Kang Seonghoon and licensed under the MIT/X11 license.
[doc]: https://lifthrasiir.github.io/rust-maybe_utf8/