Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/p-ranav/binary_log

Fast binary logger for C++
https://github.com/p-ranav/binary_log

binary-log blazingly-fast concepts cpp20 cpp20-library fastest file-writing fmt fmtlib header-only logger logger-go-brrr logging logging-library mit-license single-threaded string-formatting super-fast

Last synced: 22 days ago
JSON representation

Fast binary logger for C++

Awesome Lists containing this project

README

        



# Highlights

* Logs messages in a compact binary format
* Fast
* ***Hundreds of millions*** of logs per second
* Average latency of ***1-3 ns*** for basic data types
* See [benchmarks](https://github.com/p-ranav/binary_log#benchmarks)
* Provides an [unpacker](https://github.com/p-ranav/binary_log/tree/master/tools/unpacker) to deflate the log messages
* Uses [fmtlib](https://github.com/fmtlib/fmt) to format the logs
* Synchronous logging - not thread safe
* Header-only library
- Single header file version available [here](https://github.com/p-ranav/binary_log/blob/master/single_include/binary_log/binary_log.hpp)
* Requires C++20
* MIT License

# Usage and Performance

The following code logs 1 billion integers to file.

```cpp
#include

int main()
{
binary_log::binary_log log("log.out");

for (int i = 0; i < 1E9; ++i)
BINARY_LOG(log, "Hello logger, msg number: {}", i);
}
```

On a [modern workstation desktop](#system-details), the above code executes in `~2s`.

| Type | Value |
| --------------- | --------------------|
| Time Taken | 1.935 s |
| Throughput | 2.06 Gb/s |
| Performance | 516 million logs/s |
| Average Latency | 1.73 ns |
| File Size | ~4 GB |

```console
foo@bar:~/dev/binary_log$ time ./build/examples/billion_integers/billion_integers

real 0m1.935s
user 0m0.906s
sys 0m1.000s

foo@bar:~/dev/binary_log$ ls -lart log.out*
-rw-r--r-- 1 pranav pranav 10 Sep 20 11:46 log.out.runlength
-rw-r--r-- 1 pranav pranav 33 Sep 20 11:46 log.out.index
-rw-r--r-- 1 pranav pranav 4000000002 Sep 20 11:46 log.out
```

## Deflate the logs

These binary log files can be deflated using the provided [unpacker](https://github.com/p-ranav/binary_log/tree/master/tools/unpacker) app:

```console
foo@bar:~/dev/binary_log$ time ./build/tools/unpacker/unpacker log.out > log.deflated

real 2m19.853s
user 1m16.078s
sys 0m50.969s

foo@bar:~/dev/binary_log$ ls -lart log.deflated
-rw-r--r-- 1 pranav pranav 35888888890 Dec 6 08:09 log.deflated

foo@bar:~/dev/binary_log$ wc -l log.deflated
1000000000 log.deflated

foo@bar:~/dev/binary_log$ $ head log.deflated
Hello logger, msg number: 0
Hello logger, msg number: 1
Hello logger, msg number: 2
Hello logger, msg number: 3
Hello logger, msg number: 4
Hello logger, msg number: 5
Hello logger, msg number: 6
Hello logger, msg number: 7
Hello logger, msg number: 8
Hello logger, msg number: 9

foo@bar:~/dev/binary_log$ tail log.deflated
Hello logger, msg number: 999999990
Hello logger, msg number: 999999991
Hello logger, msg number: 999999992
Hello logger, msg number: 999999993
Hello logger, msg number: 999999994
Hello logger, msg number: 999999995
Hello logger, msg number: 999999996
Hello logger, msg number: 999999997
Hello logger, msg number: 999999998
Hello logger, msg number: 999999999
```

| Type | Value |
| ------------------- | ---------- |
| Time Taken | 2m 19s |
| Throughput | 258 MB/s |
| Original File Size | ~5 GB |
| Deflated File Size | ~35 GB |
| Log Compression | ***7x*** |

See [benchmarks](https://github.com/p-ranav/binary_log/blob/master/README.md#benchmarks) section for more performance metrics.

# Design Goals & Decisions

* Implement a single-threaded, synchronous logger - Do not provide thread safety
- If the user wants multi-threaded behavior, the user can choose and implement their own queueing solution
- There are numerous well-known lock-free queues available for this purpose ([moody::concurrentqueue](https://github.com/cameron314/concurrentqueue), [atomic_queue](https://github.com/max0x7ba/atomic_queue) etc.) - let the user choose the technology they want to use.
- The latency of enqueuing into a lock-free queue is large enough to matter
- Users who do not care about multi-threaded scenarios should not suffer the cost
- Looking at the [atomic_queue benchmarks](https://max0x7ba.github.io/atomic_queue/html/benchmarks.html), the average round-trip latency across many state-of-the-art multi-producer, multi-consumer queues, to send and receive a 4-byte integer (between 2 threads, using 2 queues) is around 150-250 ns.
* Avoid writing static information more than once
- Examples of static information: the format string, the number of format args, and type of each format arg
- Store the static information in an "index" file
- Store the dynamic information in the log file (refer to the index file where possible)
* Do as little work as possible in the runtime hot path
- No formatting of any kind
- All formatting will happen offline using an unpacker that deflates the binary logs

# How it Works

`binary_log` splits the logging into three files:



1. ***Index file*** contains all the static information from the logs, e.g., format string, number of args, type of each arg etc.
- If a format argument is marked as constant using `binary_log::constant`, the value of the arg is also stored in the index file
2. ***Log file*** contains two pieces of information per log call:
1. An index into the index table (in the index file) to know which format string was used
- If runlength encoding is working, this index might not be written, instead the final runlength will be written to the runlengths file
3. The value of each argument
3. ***Runlength file*** contains runlengths - If a log call is made 5 times, this information is stored here (instead of storing the index 5 times in the log file)
- NOTE: Runlengths are only stored if the runlength > 1 (to avoid the inflation case with RLE)

## Constants

One can specify a log format argument as a constant by wrapping the value with `binary_log::constant(...)`. When this is detected, the value is stored in the index file instead of the log file as it is now considered "static information" and does not change between calls.

```cpp
for (auto i = 0; i < 1E9; ++i) {
BINARY_LOG(log, "Joystick {}: x_min={}, x_max={}, y_min={}, y_max={}",
binary_log::constant("Nintendo Joycon"),
binary_log::constant(-0.6),
binary_log::constant(+0.65),
binary_log::constant(-0.54),
binary_log::constant(+0.71));
}
```

The above loop runs in under `500 ms`. The final output is compact at just `118 bytes` and contains all the information needed to deflate the log (if needed).

| File | Size |
| ------------------ | --------- |
| log.out | 1 byte |
| log.out.runlength | 6 bytes |
| log.out.index | 111 bytes |

```console
foo@bar:~/dev/binary_log$ ls -lart log.out*
-rw-r--r-- 1 pranav pranav 6 Dec 5 08:41 log.out.runlength
-rw-r--r-- 1 pranav pranav 111 Dec 5 08:41 log.out.index
-rw-r--r-- 1 pranav pranav 1 Dec 5 08:41 log.out

foo@bar:~/dev/binary_log$ hexdump -C log.out.index
00000000 33 4a 6f 79 73 74 69 63 6b 20 7b 7d 3a 20 78 5f |3Joystick {}: x_|
00000010 6d 69 6e 3d 7b 7d 2c 20 78 5f 6d 61 78 3d 7b 7d |min={}, x_max={}|
00000020 2c 20 79 5f 6d 69 6e 3d 7b 7d 2c 20 79 5f 6d 61 |, y_min={}, y_ma|
00000030 78 3d 7b 7d 05 0c 0b 0b 0b 0b 01 0f 4e 69 6e 74 |x={}........Nint|
00000040 65 6e 64 6f 20 4a 6f 79 63 6f 6e 01 33 33 33 33 |endo Joycon.3333|
00000050 33 33 e3 bf 01 cd cc cc cc cc cc e4 3f 01 48 e1 |33..........?.H.|
00000060 7a 14 ae 47 e1 bf 01 b8 1e 85 eb 51 b8 e6 3f |z..G.......Q..?|
0000006f
```

# Benchmarks

### System Details

| Type | Value |
| --------------- | --------------------------------------------------------------------------------------------------------- |
| Processor | 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz |
| Installed RAM | 32.0 GB (31.9 GB usable) |
| SSD | [ADATA SX8200PNP](https://www.adata.com/upload/downloadfile/Datasheet_XPG%20SX8200%20Pro_EN_20181017.pdf) |
| OS | Ubuntu 20.04 LTS running on WSL in Windows 11 |
| C++ Compiler | g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0 |

```console
foo@bar:~/dev/binary_log$ ./build/benchmark/binary_log_benchmark
2022-09-20T12:59:39-05:00
Running ./build/benchmark/binary_log_benchmark
Run on (16 X 3504 MHz CPU s)
Load Average: 0.52, 0.58, 0.59
------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------
BM_binary_log_static_integer/42 1.22 ns 1.20 ns 560000000 Latency=1.19978ns Logs/s=833.488M/s
BM_binary_log_static_integer/395 1.43 ns 1.43 ns 448000000 Latency=1.42997ns Logs/s=699.317M/s
BM_binary_log_static_integer/3123456789 1.89 ns 1.84 ns 373333333 Latency=1.84152ns Logs/s=543.03M/s
BM_binary_log_static_integer/9876543123456789 5.45 ns 2.76 ns 248888889 Latency=2.76228ns Logs/s=362.02M/s
BM_binary_log_static_integer/-42 1.25 ns 1.26 ns 560000000 Latency=1.25558ns Logs/s=796.444M/s
BM_binary_log_static_integer/-395 1.54 ns 1.57 ns 448000000 Latency=1.56948ns Logs/s=637.156M/s
BM_binary_log_static_integer/-123456789 1.94 ns 1.97 ns 373333333 Latency=1.96708ns Logs/s=508.369M/s
BM_binary_log_static_integer/-9876543123456789 4.11 ns 2.92 ns 235789474 Latency=2.91574ns Logs/s=342.967M/s
BM_binary_log_static_float 1.82 ns 1.84 ns 407272727 Latency=1.84152ns Logs/s=543.03M/s
BM_binary_log_static_double 3.29 ns 2.73 ns 263529412 Latency=2.7274ns Logs/s=366.65M/s
BM_binary_log_static_string 4.93 ns 2.92 ns 235789474 Latency=2.91574ns Logs/s=342.967M/s
BM_binary_log_random_integer 5.75 ns 5.72 ns 112000000 Latency=5.71987ns Logs/s=174.829M/s
BM_binary_log_random_integer 6.08 ns 6.14 ns 112000000 Latency=6.13839ns Logs/s=162.909M/s
BM_binary_log_random_integer 7.51 ns 7.67 ns 89600000 Latency=7.67299ns Logs/s=130.327M/s
BM_binary_log_random_integer 15.0 ns 15.0 ns 44800000 Latency=14.9972ns Logs/s=66.6791M/s
BM_binary_log_random_integer 5.70 ns 5.72 ns 112000000 Latency=5.71987ns Logs/s=174.829M/s
BM_binary_log_random_integer 5.84 ns 5.86 ns 112000000 Latency=5.85938ns Logs/s=170.667M/s
BM_binary_log_random_integer 7.89 ns 7.67 ns 89600000 Latency=7.67299ns Logs/s=130.327M/s
BM_binary_log_random_integer 14.9 ns 15.0 ns 44800000 Latency=14.9972ns Logs/s=66.6791M/s
BM_binary_log_random_real 6.29 ns 6.25 ns 100000000 Latency=6.25ns Logs/s=160M/s
BM_binary_log_random_real 11.6 ns 11.7 ns 64000000 Latency=11.7188ns Logs/s=85.3333M/s
BM_binary_log_billion_integers 2320246800 ns 1765625000 ns 1 Latency=1.76562ns Logs/s=566.372M/s
```

## Supported Format Argument Types

`binary_log` supports a limited number of types of format arguments. They are:

```
bool,
char,
uint8_t, uint16_t, uint32_t, uint64_t
int8_t, int16_t, int32_t, int64_t,
float, double,
const char*,
std::string,
std::string_view
```

# Building and installing

See the [BUILDING](BUILDING.md) document.

## Generating Single Header

```bash
python3 utils/amalgamate/amalgamate.py -c single_include.json -s .
```

# Contributing

See the [CONTRIBUTING](CONTRIBUTING.md) document.

# License

The project is available under the [MIT](https://opensource.org/licenses/MIT) license.