Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/eliaskosunen/scnlib

scanf for modern C++
https://github.com/eliaskosunen/scnlib

c-plus-plus cpp input io parsing ranges scanf

Last synced: 2 days ago
JSON representation

scanf for modern C++

Awesome Lists containing this project

README

        

# scnlib

[![Linux builds](https://github.com/eliaskosunen/scnlib/actions/workflows/linux.yml/badge.svg)](https://github.com/eliaskosunen/scnlib/actions/workflows/linux.yml)
[![macOS builds](https://github.com/eliaskosunen/scnlib/actions/workflows/macos.yml/badge.svg)](https://github.com/eliaskosunen/scnlib/actions/workflows/macos.yml)
[![Windows builds](https://github.com/eliaskosunen/scnlib/actions/workflows/windows.yml/badge.svg)](https://github.com/eliaskosunen/scnlib/actions/workflows/windows.yml)
[![Other architectures](https://github.com/eliaskosunen/scnlib/actions/workflows/arch.yml/badge.svg)](https://github.com/eliaskosunen/scnlib/actions/workflows/arch.yml)
[![Code Coverage](https://codecov.io/gh/eliaskosunen/scnlib/graph/badge.svg?token=LyWrDluna1)](https://codecov.io/gh/eliaskosunen/scnlib)

[![Latest Release](https://img.shields.io/github/v/release/eliaskosunen/scnlib?sort=semver&display_name=tag)](https://github.com/eliaskosunen/scnlib/releases)
[![License](https://img.shields.io/github/license/eliaskosunen/scnlib.svg)](https://github.com/eliaskosunen/scnlib/blob/master/LICENSE)
[![C++ Standard](https://img.shields.io/badge/C%2B%2B-17%2F20%2F23-blue.svg)](https://img.shields.io/badge/C%2B%2B-17%2F20%2F23-blue.svg)
[![Documentation](https://img.shields.io/badge/Documentation-scnlib.dev-blue)](https://scnlib.dev)

```cpp
#include
#include // for std::println (C++23)

int main() {
// Read two integers from stdin
// with an accompanying message
if (auto result =
scn::prompt("What are your two favorite numbers? ", "{} {}")) {
auto [a, b] = result->values();
std::println("Oh, cool, {} and {}!", a, b);
} else {
std::println(stderr, "Error: {}", result.error().msg());
}
}
```

Try out in [Compiler Explorer](https://godbolt.org/z/oG71eorvE).

## What is this?

`scnlib` is a modern C++ library for replacing `scanf` and `std::istream`.
This library attempts to move us ever so much closer to replacing `iostream`s
and C `stdio` altogether.
It's faster than `iostream` (see Benchmarks), and type-safe, unlike `scanf`.
Think [{fmt}](https://github.com/fmtlib/fmt) or C++20 `std::format`, but in the
other direction.

This library is the reference implementation of the ISO C++ standards proposal
[P1729 "Text Parsing"](https://wg21.link/p1729).

## Documentation

The documentation can be found online, at https://scnlib.dev.

To build the docs yourself, build the `scn_docs` target generated by CMake.
These targets are generated only if the variable `SCN_DOCS` is set in CMake
(done automatically if scnlib is the root project).
The `scn_docs` target requires Doxygen, Python 3.8 or better, and the `pip3`
package `poxy`.

## Examples

See more examples in the `examples/` folder.

### Reading a `std::string`

```cpp
#include
#include

int main() {
// Reading a std::string will read until the first whitespace character
if (auto result = scn::scan("Hello world!", "{}")) {
// Will output "Hello":
// Access the read value with result->value()
std::println("{}", result->value());

// Will output " world":
// result->range() returns a subrange containing the unused input
// C++23 is required for the std::string_view range constructor used below
std::println("{}", std::string_view{result->range()});
} else {
std::println("Couldn't parse a word: {}", result.error().msg());
}
}
```

### Reading multiple values

```cpp
#include

int main() {
auto input = std::string{"123 456 foo"};

auto result = scn::scan(input, "{} {}");
// result == true
// result->range(): " foo"

// All read values can be accessed through a tuple with result->values()
auto [a, b] = result->values();

// Read from the remaining input
// Could also use scn::ranges::subrange{result->begin(), result->end()} as input
auto result2 = scn::scan(result->range(), "{}");
// result2 == true
// result2->range().empty() == true
// result2->value() == "foo"
}
```

### Reading from a fancier range

```cpp
#include

#include

int main() {
auto result = scn::scan("123" | std::views::reverse, "{}");
// result == true
// result->begin() is an iterator into a reverse_view
// result->range() is empty
// result->value() == 321
}
```

### Repeated reading

```cpp
#include
#include

int main() {
std::vector vec{};
auto input = scn::ranges::subrange{"123 456 789"sv};

while (auto result = scn::scan(input), "{}")) {
vec.push_back(result->value());
input = result->range();
}
}
```

## Features

* Blazing fast parsing of values (see Benchmarks)
* Modern C++ interface, featuring
* type safety (variadic templates, types not determined by the format
string)
* convenience (ranges)
* ergonomics (values returned from `scn::scan`, no output parameters)
* `"{python}"`-like format string syntax
* Including compile-time format string checking
* Minimal code size increase (in user code, see Benchmarks)
* Usable without exceptions, RTTI, or ``s
* Configurable through build flags
* Limited functionality if enabled
* Supports, and requires Unicode (input is UTF-8, UTF-16, or UTF-32)
* Highly portable
* Tested on multiple platforms, see CI
* Works on multiple architectures, tested on x86, x86-64, arm, aarch64,
riscv64, ppc64le, and riscv64

## Installing

`scnlib` uses CMake.
If your project already uses CMake, integration should be trivial, through
whatever means you like:
`make install` + `find_package`, `FetchContent`, `git submodule` + `add_subdirectory`,
or something else.

There are community-maintained packages available
on [Conan](https://conan.io/center/recipes/scnlib) and
on [vcpkg](https://github.com/microsoft/vcpkg/tree/master/ports/scnlib).

The `scnlib` CMake target is `scn::scn`

```cmake
# Target with which you'd like to use scnlib
add_executable(my_program ...)
target_link_libraries(my_program scn::scn)
```

See docs for usage without CMake.

## Compiler support

A C++17-compatible compiler is required. The following compilers are tested in
CI:

* GCC 7 and newer
* Clang 8 and newer
* Visual Studio 2019 and 2022

Including the following environments:

* 32-bit and 64-bit builds on Windows
* libc++ on Linux
* gcc on Alpine Linux
* AppleClang and gcc on macOS 12 (Monterey) and 14 (Sonoma)
* clang-cl with VS 2019 and 2022
* MinGW and MSys2
* GCC on armv6, armv7, aarch64, riscv64, s390x, and ppc64le
* Visual Studio 2022, cross-compiling to arm64

## Benchmarks

### Run-time performance

All times below are in nanoseconds of CPU time.
Lower is better.

#### Integer parsing (`int`)

![Integer result, chart](benchmark/runtime/results/int.png)

| Test | Test 1 `"single"` | Test 2 `"repeated"` | Test average |
|:---------------------------------|------------------:|--------------------:|-------------:|
| `scn::scan` | 23.8 | 30.4 | 27.1 |
| `scn::scan_value` | 20.5 | 27.4 | 24.0 |
| `scn::scan_int` | 16.5 | 24.1 | 20.3 |
| `scn::scan_int_exhaustive_valid` | 4.08 | - | 4.08 |
| `std::stringstream` | 117 | 53.9 | 85.5 |
| `sscanf` | 71.3 | 474 | 272.7 |
| `strtol` | 16.3 | 23.8 | 20.1 |
| `std::from_chars` | 8.73 | 13.0 | 10.9 |
| `fast_float::from_chars` | 6.87 | 11.8 | 9.35 |

#### Floating-point number parsing (`double`)

![Float result, chart](benchmark/runtime/results/float.png)

| Test | Test 1 `"single"` | Test 2 `"repeated"` | Test Average |
|:-------------------------|------------------:|--------------------:|-------------:|
| `scn::scan` | 55.8 | 63.7 | 59.7 |
| `scn::scan_value` | 52.1 | 58.8 | 55.5 |
| `std::stringstream` | 294 | 271 | 283 |
| `sscanf` | 159 | 704 | 432 |
| `strtod` | 79.1 | 153 | 116 |
| `std::from_chars` | 18.0 | 28.1 | 23.0 |
| `fast_float::from_chars` | 20.6 | 27.8 | 24.2 |

#### String "word" (whitespace-separated character sequence) parsing (`string` and `string_view`)

![String result, chart](benchmark/runtime/results/string.png)

| Test | |
|:-------------------------------|-----:|
| `scn::scan` | 24.5 |
| `scn::scan` | 22.2 |
| `scn::scan_value` | 23.1 |
| `scn::scan_value` | 21.0 |
| `std::stringstream` | 134 |
| `sscanf` | 58.4 |

#### Conclusions

* `scn::scan` is always faster than using `stringstream`s and `sscanf`
* `std::from_chars`/`fast_float::from_chars` is faster than `scn::scan`, but it
supports fewer features
* `strtod` is slower than `scn::scan`, and supports fewer features
* `scn::scan_value` is slightly faster compared to `scn::scan`
* `scn::scan_int` is faster than both `scn::scan` and `scn::scan_value`
* `strtol` is ~on-par with `scn::scan_int`.
* `scn::scan_int_exhaustive_valid` is blazing-fast.

#### About

Above,

* "Test 1" refers to scanning a single value from a string,
which only contains the text representation for that value.
The time used for creating any state needed for the scanner is included,
for example, constructing a `stringstream`. This test is called `"single"` in
the benchmark sources.
* "Test 2" refers to the average time of scanning a value,
which contains multiple values in their text representations, separated by
spaces. The time used for creating any state needed for the scanner
is not included. This test is called `"repeated"` in the benchmark sources.
* The string test is an exception: strings are read one after another from a
sample of Lorem Ipsum.

The difference between "Test 1" and "Test 2" is most pronounced when using
a `stringstream`, which is relatively expensive to construct,
and seems to be adding around ~50ns of runtime.
With `sscanf`, it seems like using the `%n` specifier and skipping whitespace
are really expensive (~400ns of runtime).
With `scn::scan` and `std::from_chars`, there's really no state to construct,
and the results for "Test 1" and "Test 2" are thus quite similar.

These benchmarks were run on a Fedora 40 machine, running the Linux kernel version
6.8.9, with an AMD Ryzen 7 5700X processor, and compiled with clang version 18.1.1,
with `-O3 -DNDEBUG -march=haswell` and LTO enabled.
These benchmarks were run on 2024-05-23 (commit 3fd830de).

The source code for these benchmarks can be found in the `benchmark` directory.
You can run these benchmarks yourself by enabling the CMake
variable `SCN_BENCHMARKS`.
This variable is `ON` by default, if `scnlib` is the root CMake project,
and `OFF` otherwise.

```sh
$ cd build
$ cmake -DSCN_BENCHMARKS=ON \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON \
-DSCN_USE_HASWELL_ARCH=ON ..
$ cmake --build .
# choose benchmarks to run in ./benchmark/runtime/*/*_bench
$ ./benchmark/runtime/integer/scn_int_bench
```

### Executable size

All sizes below are in kibibytes (KiB), measuring the compiled executable.
"Stripped size" shows the size of the executable after running `strip`.
Lower is better.

#### Release build (`-O3 -DNDEBUG` + LTO)

![Release result, chart](benchmark/binarysize/graph-release.png)

Size of `scnlib` shared library (`.so`): 1.7M

| Method | Executable size | Stripped size |
| :------------- | --------------: | ------------: |
| empty | 7.6 | 4.4 |
| `std::scanf` | 10.4 | 5.8 |
| `std::istream` | 11.1 | 6.2 |
| `scn::input` | 11.2 | 6.4 |

#### Minimized (MinSizeRel) build (`-Os -DNDEBUG` + LTO)

![MinSizeRel result, chart](benchmark/binarysize/graph-minsizerel.png)

Size of `scnlib` shared library (`.so`): 1.1M

| Method | Executable size | Stripped size |
| :------------- | --------------: | ------------: |
| empty | 7.5 | 4.4 |
| `std::scanf` | 10.3 | 5.8 |
| `std::istream` | 11.0 | 6.1 |
| `scn::input` | 12.4 | 6.6 |

#### Debug build (`-g -O0`)

![Debug result, chart](benchmark/binarysize/graph-debug.png)

Size of `scnlib` shared library (`.so`): 20M

| Method | Executable size | Stripped size |
| :------------- | --------------: | ------------: |
| empty | 18.4 | 5.2 |
| `std::scanf` | 429 | 11.8 |
| `std::istream` | 438 | 9.4 |
| `scn::input` | 2234 | 51.3 |

#### Conclusions

When using optimized builds, depending on compiler flags, scnlib provides a
binary, the size of which is within ~5% of what would be produced with `scanf`
or ``s.
In a Debug-environment, scnlib is ~5x bigger when compared to `scanf`
or ``. After `strip`ping the binaries,
these differences largely go away, except in Debug builds.

#### About

In these tests, 25 translation units are generated, in all of which values are
read from `stdin` five times.
This is done to simulate a small project.
`scnlib` is linked dynamically, to level the playing field with the standard
library, which is also dynamically linked.

The code was compiled on Fedora 40, with GCC 14.1.1.
See the directory `benchmark/binarysize` for the source code.

You can run these benchmarks yourself by enabling the CMake
variable `SCN_BENCHMARKS_BINARYSIZE`.
This variable is `ON` by default, if `scnlib` is the root CMake project,
and `OFF` otherwise.

```sh
$ cd build
# For Debug
$ cmake -DCMAKE_BUILD_TYPE=Debug \
-DSCN_BENCHMARKS_BINARYSIZE=ON \
-DBUILD_SHARED_LIBS=ON ..
# For Release and MinSizeRel,
# add -DCMAKE_BUILD_TYPE=$BUILD_TYPE and
# -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON

$ cmake --build .
$ ./benchmark/binarysize/run_binarysize_bench.py ./benchmark/binarysize $BUILD_TYPE
```

### Build time

#### Build time

Time is in seconds of CPU time (user time + sys/kernel time).
Lower is better.

| Method | Debug | Release |
|:-------------|------:|--------:|
| empty | 0.05 | 0.05 |
| `scanf` | 0.22 | 0.20 |
| `` | 0.28 | 0.27 |
| `scn::input` | 0.54 | 0.45 |

#### Memory consumption

Memory is in mebibytes (MiB) used while compiling.
Lower is better.

| Method | Debug | Release |
|:-------------|------:|--------:|
| empty | 21.0 | 23.3 |
| `scanf` | 56.3 | 53.6 |
| `` | 67.8 | 65.0 |
| `scn::input` | 102 | 91.0 |

#### Conclusions

Code using scnlib takes around 2x longer to compile compared to ``,
and also uses around 1.5x more memory.
Release builds seem to be slightly faster as compared to Debug builds.

#### About

These tests measure the time it takes to compile a binary when using different
libraries.
The time taken to compile the library itself is not taken into account
(the standard library is precompiled, anyway).

These tests were run on a Fedora 40 machine, with an AMD Ryzen 7 5700X
processor, using GCC version 14.1.1.
The compiler flags used for a Debug build were `-g`, and `-O3 -DNDEBUG` for a
Release build.

You can run these benchmarks yourself by enabling the CMake
variable `SCN_BENCHMARKS_BUILDTIME`.
This variable is `ON` by default, if `scnlib` is the root CMake project,
and `OFF` otherwise.
For these tests to work, `c++` must point to a GCC-compatible C++
compiler binary,
and a somewhat POSIX-compatible `/usr/bin/time` must be available.

```sh
$ cd build
$ cmake -DSCN_BENCMARKS_BUILDTIME=ON ..
$ cmake --build .
$ ./benchmark/buildtime/run-buildtime-tests.sh
```

## Acknowledgements

The contents of this library are heavily influenced by {fmt} and its derivative
works.
https://github.com/fmtlib/fmt

The design of this library is also inspired by the Python `parse` library:
https://github.com/r1chardj0n3s/parse

### Third-party libraries

*fast_float* for floating-point number parsing:
https://github.com/fastfloat/fast_float

*NanoRange* for a minimal `` implementation:
https://github.com/tcbrindle/NanoRange

## License

scnlib is licensed under the Apache License, version 2.0.
Copyright (c) 2017 Elias Kosunen
See LICENSE for further details.