https://github.com/ltla/byteme
C++ utilities for simple buffered inputs.
https://github.com/ltla/byteme
Last synced: 3 months ago
JSON representation
C++ utilities for simple buffered inputs.
- Host: GitHub
- URL: https://github.com/ltla/byteme
- Owner: LTLA
- License: mit
- Created: 2021-12-26T01:38:12.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2026-03-25T01:07:39.000Z (3 months ago)
- Last Synced: 2026-03-26T07:20:03.375Z (3 months ago)
- Language: C++
- Homepage: https://ltla.github.io/byteme/
- Size: 3.06 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Gimme some bytes


[](https://codecov.io/gh/LTLA/byteme)
## Overview
This library implements a few functors to read buffered inputs from uncompressed or Gzip-compressed files or buffers.
Classes can be exchanged at compile- or run-time to easily re-use the same code across different input sources.
The aim is to consolidate some common boilerplate across several projects, e.g., [**tatami**](https://github.com/LTLA/tatami), [**singlepp**](https://github.com/LTLA/singlepp).
Interfacing with Zlib is particularly fiddly and I don't want to be forced to remember how to do it in each project.
## Usage
To read bytes, create an instance of the desired `Reader` class and loop until no bytes remain in the source.
```cpp
#include "byteme/byteme.hpp"
const char* filepath = "input.gz";
byteme::GzipFileReader reader(filepath, {});
std::vector buffer(20);
while (1) {
// read() returns the number of bytes that were actually read into the buffer.
auto num_read = reader.read(buffer.data(), buffer.size());
/* Do something with the available bytes in the buffer */
if (num_read < buffer.size()) {
// If fewer bytes are read than requested, the input is finished.
break;
}
}
```
To write bytes, create the desired `Writer` class and supply an array of bytes until completion.
```cpp
#include "byteme/byteme.hpp"
std::vector lyrics {
"Kimi dake o kimi dake o",
"Suki de ita yo",
"Kaze de me ga nijinde",
"Tooku naru yo"
};
byteme::GzipFileWriter writer("something.gz", {});
const char newline = '\n';
for (const auto& line : lyrics) {
writer.write(reinterpret_cast(line.c_str()), line.size());
writer.write(reinterpret_cast(&newline), 1);
}
writer.finish();
```
More details can be found in the [reference documentation](https://ltla.github.io/byteme).
## Supported classes
For the readers:
| Class | Description |
|-------|-------------|
|`RawBufferReader`| Read from a uncompressed buffer|
|`RawFileReader`| Read from an uncompressed file|
|`ZlibBufferReader`| Read from a Zlib-compressed buffer|
|`GzipFileReader`| Read from an Gzip-compressed file|
|`IstreamReader`| Read from a `std::istream`|
For the writers:
| Class | Description |
|-------|-------------|
|`RawBufferWriter`| Write to a uncompressed buffer|
|`RawFileWriter`| Write to an uncompressed file|
|`ZlibBufferWriter`| Write to a Zlib-compressed buffer|
|`GzipFileWriter`| Write to an Gzip-compressed file|
|`OstreamWriter`| Write to a `std::ostream`|
The different subclasses can be switched at compile time via templating, or at run-time by exploiting the class hierarchy:
```cpp
#include "byteme/byteme.hpp"
#include
std::vector input_buffer;
auto buffer = input_buffer.data();
size_t length = input_buffer.size();
std::unique_ptr ptr;
if (some_condition) {
ptr.reset(new byteme::ZlibBufferReader(buffer, length, {}));
} else {
ptr.reset(new byteme::RawBufferReader(buffer, length));
}
// Read bytes into the buffer from an abstract input source.
std::vector buffer(123);
auto available = ptr->read(buffer.data(), buffer.size());
```
Most of the `Reader` and `Writer` constructors will also accept a matching `Options` instance to fine-tune their behavior.
```cpp
// For readers.
byteme::ZlibBufferReaderOptions zopt;
zopt.buffer_size = 8096;
zopt.mode = byteme::ZlibCompressionMode::GZIP;
byteme::ZlibBufferReader zreader(buffer, length, zopt);
// For writers.
byteme::ZlibBufferWriterOptions zwopt;
zwopt.buffer_size = 8096;
zwopt.mode = byteme::ZlibCompressionMode::DEFLATE;
zwopt.compression_level = 9;
byteme::ZlibBufferReader zwriter(zwopt);
```
## Buffered reading and writing
Some applications need to access small chunks or individual bytes from the input stream.
Calling `Reader::read()` for each request could be too expensive, e.g., if each call makes some attempt to access a storage device.
In such cases, users can create a `BufferedReader` class to wrap each `Reader`.
This will read a large chunk into a buffer from which smaller chunks or individual bytes can be extracted.
```cpp
auto reader = std::make_unique(filepath, {})
byteme::SerialBufferedReader pb(std::move(reader), /* buffer_size = */ 65536);
auto valid = pb.valid();
while (valid) {
char x = pb.get();
// Do something with 'x'.
valid = pb.advance();
}
```
We can also extract a range of bytes:
```cpp
auto reader = std::make_unique(filepath, {})
byteme::SerialBufferedReader pb(std::move(reader), /* buffer_size = */ 65536);
while (valid) {
std::int32_t value;
auto outcome = pb.extract(reinterpret_cast(&value), sizeof(std::int32_t));
if (outcome.first != sizeof(std::int32_t)) {
// uh oh, not enough bytes.
} else {
// do something with the extracted integer.
}
valid = outcome.second;
}
```
We can even perform the reading in a separate thread via the `ParallelBufferedReader` class.
This allows the (possibly expensive) disk IO operations to be performed in parallel to the user-level parsing.
```cpp
auto reader = std::make_unique(filepath, {})
byteme::ParallelBufferedReader pb(std::move(reader), /* buffer_size = */ 65536);
auto valid = pb.valid();
while (valid) {
char x = pb.get();
// Do something with 'x'.
valid = pb.advance();
}
```
Similarly, `BufferedWriter` will cache all write requests into a large buffer,
intermittently calling `Writer::write()` to push the buffered bytes to the underlying storage.
```cpp
auto writer = std::make_unique(filepath, {})
byteme::SerialBufferedWriter pb(std::move(writer), /* buffer_size = */ 65536);
std::string input("foobarwhee");
for (auto i : input) { // write individual bytes.
pb.write(i);
}
pb.write(input.c_str(), input.size()); // or write an array.
pb.finish(); // flush everything to file.
```
## Building projects
### CMake using `FetchContent`
If you're using CMake, you just need to add something like this to your `CMakeLists.txt`:
```cmake
include(FetchContent)
FetchContent_Declare(
byteme
GIT_REPOSITORY https://github.com/LTLA/byteme
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(byteme)
```
Then you can link to **byteme** to make the headers available during compilation:
```cmake
# For executables:
target_link_libraries(myexe byteme)
# For libaries
target_link_libraries(mylib INTERFACE byteme)
```
### CMake using `find_package()`
You can install the library by cloning a suitable version of this repository and running the following commands:
```sh
mkdir build && cd build
cmake .. -DBYTEME_TESTS=OFF
cmake --build . --target install
```
Then you can use `find_package()` as usual:
```cmake
find_package(ltla_byteme CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::byteme)
```
### Manual
If you're not using CMake, the simple approach is to just copy the files the `include/` subdirectory -
either directly or with Git submodules - and include their path during compilation with, e.g., GCC's `-I`.
### Adding Zlib support
To support Gzip-compressed files, we also need to link to Zlib.
When using CMake, **byteme** will automatically attempt to use `find_package()` to find the system Zlib.
If no Zlib is found, it is skipped and no Gzip functionality is provided by the libary.
Users can also set the `BYTEME_FIND_ZLIB` option to `OFF` to provide their own Zlib.
## Further comments
I thought about using C++ streams, much like how the [**zstr**](https://github.com/mateidavid/zstr) library handles Gzip (de)compression.
However, I'm not very knowledgeable about the `std::istream` interface, so I decided to go with something simpler.
Just in case, I did add a `byteme::IstreamReader` class so that **byteme** clients can easily leverage custom streams.