Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/squeek502/zig-unidecode
Very approximate UTF-8 to ASCII transliterator (a Zig implementation of the Text::Unidecode Perl module)
https://github.com/squeek502/zig-unidecode
transliteration unidecode zig zig-package ziglang
Last synced: 13 days ago
JSON representation
Very approximate UTF-8 to ASCII transliterator (a Zig implementation of the Text::Unidecode Perl module)
- Host: GitHub
- URL: https://github.com/squeek502/zig-unidecode
- Owner: squeek502
- License: 0bsd
- Created: 2022-01-02T09:25:55.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-01-02T09:45:14.000Z (about 3 years ago)
- Last Synced: 2024-12-01T14:46:12.683Z (2 months ago)
- Topics: transliteration, unidecode, zig, zig-package, ziglang
- Language: Zig
- Homepage:
- Size: 81.1 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# zig-unidecode
A [Zig](https://ziglang.org/) implementation of the [Text::Unidecode Perl module](https://metacpan.org/pod/Text::Unidecode) to convert UTF-8 text into a (very) approximate ASCII-only transliteration. That is, this is "meant to be a transliterator of last resort."
For a more detailed description including motivation, caveats, etc, see:
https://metacpan.org/pod/Text::Unidecode
## Examples
| UTF-8 | Transliterated ASCII |
| ------------- | ------------- |
| `"ÿéáh"` | `"yeah"` |
| `"北亰"` | `"Bei Jing "` |
| `"Славься"` | `"Slav'sia"` |
| `"[██ ] 50%"` | `"[## ] 50%"` |## Some things worth noting
- The returned output will only contain ASCII characters (`0x00`-`0x7F`).
- Any ASCII characters in the input will be unconverted in the output.
- UTF-8 codepoints may be transliterated to a variable number of ASCII
characters (including 0).
- UTF-8 codepoints > `0x7F` will never be transliterated to include any
ASCII control characters except `\n`.
- Unknown UTF-8 codepoints may be transliterated to `[?]`.## The different functions provided
### `unidecodeAlloc`
Takes an allocator in order to handle any input size safely. This should be used for most use-cases.
### `unidecodeBuf`
Takes a `dest` slice that must be large enough to handle the transliterated ASCII. Because the output size can vary greatly depending on the input, this is unsafe unless it can be known ahead-of-time that the transliterated output will fit (i.e. comptime).
### `unidecodeStringLiteral`
A way to transliterate a UTF-8 string literal into ASCII at compile time.