https://github.com/ltla/byteme

C++ utilities for simple buffered inputs.
https://github.com/ltla/byteme
Last synced: 3 months ago
JSON representation
C++ utilities for simple buffered inputs.
Host: GitHub
URL: https://github.com/ltla/byteme
Owner: LTLA
License: mit
Created: 2021-12-26T01:38:12.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2026-03-25T01:07:39.000Z (3 months ago)
Last Synced: 2026-03-26T07:20:03.375Z (3 months ago)
Language: C++
Homepage: https://ltla.github.io/byteme/
Size: 3.06 MB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Gimme some bytes 

![Unit tests](https://github.com/LTLA/byteme/actions/workflows/run-tests.yaml/badge.svg)

![Documentation](https://github.com/LTLA/byteme/actions/workflows/doxygenate.yaml/badge.svg)

[![codecov](https://codecov.io/gh/LTLA/byteme/branch/master/graph/badge.svg?token=7I3UBJLHSO)](https://codecov.io/gh/LTLA/byteme)

## Overview

This library implements a few functors to read buffered inputs from uncompressed or Gzip-compressed files or buffers.

Classes can be exchanged at compile- or run-time to easily re-use the same code across different input sources.

The aim is to consolidate some common boilerplate across several projects, e.g., [**tatami**](https://github.com/LTLA/tatami), [**singlepp**](https://github.com/LTLA/singlepp).

Interfacing with Zlib is particularly fiddly and I don't want to be forced to remember how to do it in each project.

## Usage

To read bytes, create an instance of the desired `Reader` class and loop until no bytes remain in the source.

```cpp

#include "byteme/byteme.hpp"

const char* filepath = "input.gz";

byteme::GzipFileReader reader(filepath, {}); 

std::vector buffer(20);

while (1) {

    // read() returns the number of bytes that were actually read into the buffer.

    auto num_read = reader.read(buffer.data(), buffer.size());

    /* Do something with the available bytes in the buffer */

    if (num_read < buffer.size()) {

        // If fewer bytes are read than requested, the input is finished.

        break;

    }

}

```

To write bytes, create the desired `Writer` class and supply an array of bytes until completion.

```cpp

#include "byteme/byteme.hpp"

std::vector lyrics { 

    "Kimi dake o kimi dake o", 

    "Suki de ita yo",

    "Kaze de me ga nijinde",

    "Tooku naru yo"

};

byteme::GzipFileWriter writer("something.gz", {});

const char newline = '\n';

for (const auto& line : lyrics) {

    writer.write(reinterpret_cast(line.c_str()), line.size());

    writer.write(reinterpret_cast(&newline), 1);

}

writer.finish();

```

More details can be found in the [reference documentation](https://ltla.github.io/byteme).

## Supported classes

For the readers:

| Class | Description |

|-------|-------------|

|`RawBufferReader`| Read from a uncompressed buffer|

|`RawFileReader`| Read from an uncompressed file|

|`ZlibBufferReader`| Read from a Zlib-compressed buffer|

|`GzipFileReader`| Read from an Gzip-compressed file|

|`IstreamReader`| Read from a `std::istream`|

For the writers:

| Class | Description |

|-------|-------------|

|`RawBufferWriter`| Write to a uncompressed buffer|

|`RawFileWriter`| Write to an uncompressed file|

|`ZlibBufferWriter`| Write to a Zlib-compressed buffer|

|`GzipFileWriter`| Write to an Gzip-compressed file|

|`OstreamWriter`| Write to a `std::ostream`|

The different subclasses can be switched at compile time via templating, or at run-time by exploiting the class hierarchy:

```cpp

#include "byteme/byteme.hpp"

#include 

std::vector input_buffer;

auto buffer = input_buffer.data();

size_t length = input_buffer.size();

std::unique_ptr ptr;

if (some_condition) {

    ptr.reset(new byteme::ZlibBufferReader(buffer, length, {}));

} else {

    ptr.reset(new byteme::RawBufferReader(buffer, length));

}

// Read bytes into the buffer from an abstract input source. 

std::vector buffer(123);

auto available = ptr->read(buffer.data(), buffer.size());

```

Most of the `Reader` and `Writer` constructors will also accept a matching `Options` instance to fine-tune their behavior.

```cpp

// For readers.

byteme::ZlibBufferReaderOptions zopt;

zopt.buffer_size = 8096;

zopt.mode = byteme::ZlibCompressionMode::GZIP;

byteme::ZlibBufferReader zreader(buffer, length, zopt);

// For writers.

byteme::ZlibBufferWriterOptions zwopt;

zwopt.buffer_size = 8096;

zwopt.mode = byteme::ZlibCompressionMode::DEFLATE;

zwopt.compression_level = 9;

byteme::ZlibBufferReader zwriter(zwopt);

```

## Buffered reading and writing

Some applications need to access small chunks or individual bytes from the input stream.

Calling `Reader::read()` for each request could be too expensive, e.g., if each call makes some attempt to access a storage device.

In such cases, users can create a `BufferedReader` class to wrap each `Reader`.

This will read a large chunk into a buffer from which smaller chunks or individual bytes can be extracted.

```cpp

auto reader = std::make_unique(filepath, {})

byteme::SerialBufferedReader pb(std::move(reader), /* buffer_size = */ 65536);

auto valid = pb.valid();

while (valid) {

    char x = pb.get();

    // Do something with 'x'.

    valid = pb.advance();

}

```

We can also extract a range of bytes:

```cpp

auto reader = std::make_unique(filepath, {})

byteme::SerialBufferedReader pb(std::move(reader), /* buffer_size = */ 65536);

while (valid) {

    std::int32_t value;

    auto outcome = pb.extract(reinterpret_cast(&value), sizeof(std::int32_t)); 

    if (outcome.first != sizeof(std::int32_t)) {

        // uh oh, not enough bytes.

    } else {

        // do something with the extracted integer.

    }

    valid = outcome.second;

}

```

We can even perform the reading in a separate thread via the `ParallelBufferedReader` class.

This allows the (possibly expensive) disk IO operations to be performed in parallel to the user-level parsing.

```cpp

auto reader = std::make_unique(filepath, {})

byteme::ParallelBufferedReader pb(std::move(reader), /* buffer_size = */ 65536);

auto valid = pb.valid();

while (valid) {

    char x = pb.get();

    // Do something with 'x'.

    valid = pb.advance();

}

```

Similarly, `BufferedWriter` will cache all write requests into a large buffer,

intermittently calling `Writer::write()` to push the buffered bytes to the underlying storage.

```cpp

auto writer = std::make_unique(filepath, {})

byteme::SerialBufferedWriter pb(std::move(writer), /* buffer_size = */ 65536);

std::string input("foobarwhee");

for (auto i : input) { // write individual bytes.

    pb.write(i);

}

pb.write(input.c_str(), input.size()); // or write an array.

pb.finish(); // flush everything to file.

```

## Building projects

### CMake using `FetchContent`

If you're using CMake, you just need to add something like this to your `CMakeLists.txt`:

```cmake

include(FetchContent)

FetchContent_Declare(

  byteme 

  GIT_REPOSITORY https://github.com/LTLA/byteme

  GIT_TAG master # or any version of interest

)

FetchContent_MakeAvailable(byteme)

```

Then you can link to **byteme** to make the headers available during compilation:

```cmake

# For executables:

target_link_libraries(myexe byteme)

# For libaries

target_link_libraries(mylib INTERFACE byteme)

```

### CMake using `find_package()`

You can install the library by cloning a suitable version of this repository and running the following commands:

```sh

mkdir build && cd build

cmake .. -DBYTEME_TESTS=OFF

cmake --build . --target install

```

Then you can use `find_package()` as usual:

```cmake

find_package(ltla_byteme CONFIG REQUIRED)

target_link_libraries(mylib INTERFACE ltla::byteme)

```

### Manual

If you're not using CMake, the simple approach is to just copy the files the `include/` subdirectory -

either directly or with Git submodules - and include their path during compilation with, e.g., GCC's `-I`.

### Adding Zlib support

To support Gzip-compressed files, we also need to link to Zlib.

When using CMake, **byteme** will automatically attempt to use `find_package()` to find the system Zlib.

If no Zlib is found, it is skipped and no Gzip functionality is provided by the libary.

Users can also set the `BYTEME_FIND_ZLIB` option to `OFF` to provide their own Zlib.

## Further comments

I thought about using C++ streams, much like how the [**zstr**](https://github.com/mateidavid/zstr) library handles Gzip (de)compression.

However, I'm not very knowledgeable about the `std::istream` interface, so I decided to go with something simpler.

Just in case, I did add a `byteme::IstreamReader` class so that **byteme** clients can easily leverage custom streams.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ltla/byteme

Awesome Lists containing this project

README