An open API service indexing awesome lists of open source software.

https://github.com/burntsushi/utf8-ranges

Convert contiguous ranges of Unicode codepoints to UTF-8 byte ranges.
https://github.com/burntsushi/utf8-ranges

Last synced: 6 months ago
JSON representation

Convert contiguous ranges of Unicode codepoints to UTF-8 byte ranges.

Awesome Lists containing this project

README

          

**DEPRECATED:** This crate has been folded into the
[`regex-syntax`](https://docs.rs/regex-syntax) and is now deprecated.

utf8-ranges
===========
This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte
ranges. This is useful when constructing byte based automata from Unicode.
Stated differently, this lets one embed UTF-8 decoding as part of one's
automaton.

[![Linux build status](https://api.travis-ci.org/BurntSushi/utf8-ranges.png)](https://travis-ci.org/BurntSushi/utf8-ranges)
[![](http://meritbadge.herokuapp.com/utf8-ranges)](https://crates.io/crates/utf8-ranges)

Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).

### Documentation

https://docs.rs/utf8-ranges

### Example

This shows how to convert a scalar value range (e.g., the basic multilingual
plane) to a sequence of byte based character classes.

```rust
extern crate utf8_ranges;

use utf8_ranges::Utf8Sequences;

fn main() {
for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
println!("{:?}", range);
}
}
```

The output:

```text
[0-7F]
[C2-DF][80-BF]
[E0][A0-BF][80-BF]
[E1-EC][80-BF][80-BF]
[ED][80-9F][80-BF]
[EE-EF][80-BF][80-BF]
```

These ranges can then be used to build an automaton. Namely:

1. Every arbitrary sequence of bytes matches exactly one of the sequences of
ranges or none of them.
2. Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous
encodings of surrogate codepoints in UTF-8 cannot match any of the byte
ranges above.)