https://github.com/vspefs/char_db

A modern C++ header-only library for general encoding/decoding of characters, built around native language features like ranges.
https://github.com/vspefs/char_db

Last synced: 7 months ago
JSON representation

A modern C++ header-only library for general encoding/decoding of characters, built around native language features like ranges.

Host: GitHub
URL: https://github.com/vspefs/char_db
Owner: vspefs
License: agpl-3.0
Created: 2025-05-20T14:11:46.000Z (8 months ago)
Default Branch: master
Last Pushed: 2025-06-16T08:42:14.000Z (7 months ago)
Last Synced: 2025-06-16T09:54:01.547Z (7 months ago)
Language: C++
Homepage:
Size: 48.8 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# char_db (WIP)

A modern C++ header-only library for general encoding/decoding of characters, built around native language features
like `char8_t`, `char16_t`, `char32_t`, and ranges, with built-in support for UTF-8, UTF-16, and UTF-32.

The author is a lazy dumb ass and this library is still a work in progress, which leave vast room for contribution, bug
fix or improvement. For example, author hasn't even figure out the lowest language standard supported. See
[TODO](TODO.md) for a to-do list.

## Features

- Header-only, zero-dependency (except standard library)
- Built-in Unicode character validation and code point conversion for UTF-8, UTF-16, UTF-32
- Range-based views for iterating subsequences that encodes Unicode characters
- Designed for constexpr and compile-time usage (though no such view has been implemented now)
- Official integration to a wide range of build systems (we have 2 now I guess that's a win)
- C++20 module support (can you believe I'm lazy enough to not have done this?)

## Usage

Include the main header in your project:

```cpp
#include
```

Example: Iterate over UTF-8 code points in a string

```cpp
#include
#include
#include

int
main ()
{
using namespace std::literals;
auto seq = u8"Hello, 🌍!"sv;
for (auto subseq : seq | char_db::views::decoding)
{
auto cp = char_db::utf8::to_code_point (subseq);
auto mblen = std::ranges::size (subseq);
std::println ("U+{:04X}, using {} UTF-8 code units", static_cast(cp), mblen);
}
}
```

## Building & Installing

This project uses CMake:

```sh
cmake -B build
cmake --build build
cmake --install build
```

To build the example, enable the `BUILD_EXAMPLE` option:

```sh
cmake -B build -DBUILD_EXAMPLE=ON
cmake --build build
```

## (Current) API Overview

- `char_db::utf8`, `char_db::utf16`, `char_db::utf32`: Static interfaces for encoding/decoding and validation
- `char_db::views::decoding`: Range adaptor for decoding code unit sequences into code points
- `char_db::views::decoded`: Range adaptor for iterating decoded code unit sequences that represent valid Unicode code points

## Contributing

Just contribute, bro. Note that I use (my own version of) GNU Code Style. You can use whatever you want, because
I'll reformat them before merges. Sorry if you're not a fan of GNU Style. Everyone has their own kinks.

## License

AGPL License. See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vspefs/char_db

Awesome Lists containing this project

README