An open API service indexing awesome lists of open source software.

https://github.com/eepp/yactfr

yet another CTF reader: a CTF reading library offering a C++14 API
https://github.com/eepp/yactfr

common-trace-format cpp14 ctf decoder library tracing

Last synced: about 1 year ago
JSON representation

yet another CTF reader: a CTF reading library offering a C++14 API

Awesome Lists containing this project

README

          

// Render with Asciidoctor

= yactfr
Philippe Proulx
3 April 2024
:idprefix:
:idseparator: -
ifdef::env-github[]
:toc: macro
endif::env-github[]
ifndef::env-github[]
:toc: left
endif::env-github[]

_**yactfr**_ (yac·tif·er) is _**y**et **a**nother **CTF r**eader_.

The yactfr library is written in pass:[C++14] and offers a (🥁...)
pass:[C++14] API.

While the https://diamon.org/ctf/[CTF] reading libraries that I know
about focus on decoding and providing you with complete, ordered event
record objects, the yactfr API offers a lower level of CTF processing:
you iterate individual element sequences to obtain _elements_:
beginning/end of packet, beginning/end of event record, beggining/end of
structure, individual data stream scalar values like fixed-length
integers, fixed-length floating point numbers, and null-terminated
strings, specific clock value update, known data stream ID, and the
rest. This means the sizes of event records and packets don't influence
the performance or memory usage of yactfr.

Some use cases where this library can prove useful:

* Trace inspection programs, for example when you need to debug a custom
CTF producer.

* Foundation of an API with a higher level of abstraction.

* Targeted offline analyses with low memory usage.

The name _yactfr_ takes its inspiration from
https://lloyd.github.io/yajl/[yajl], a JSON parser with a similar
approach.

WARNING: yactfr is not mature yet! Its API is still experimental:
the interface could change at any time.

ifdef::env-github[]
toc::[]
endif::env-github[]

== Notable features

* Full https://diamon.org/ctf/v1.8.3/[CTF{nbsp}1.8.3] and
https://diamon.org/ctf[CTF{nbsp}2] support.

* Only depends on http://www.boost.org/[Boost] at _build time_, and
nothing at run time (except for your pass:[C++] runtime library):
+
----
$ ldd libyactfr.so
linux-vdso.so.1 (0x00007ffe98fa5000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fb8337d2000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fb83343d000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fb833225000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fb832e69000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007fb833e51000)
----

* Simple, documented pass:[C++14] API with a focus on immutability.

* Efficient data stream decoding: zero memory allocation, zero smart
pointer reference count change, zero virtual method call, and zero
copy during an element sequence iteration.

** yactfr decodes a data stream fixed-length bit array in place: the
result is available as an element member.

** A data stream string/BLOB is available as one or more raw data parts
in consecutive elements. The actual raw data is a pointer to the
current data block of the data source (for example, from a memory
mapped region), which means zero string/BLOB copy.
+
This is possible because, as per CTF{nbsp}1.8.3 and CTF2-SPEC-2.0,
data stream strings and BLOBs have an 8-bit alignment requirement.

* Decodes CTF packets of any size and event records of any size with
steady memory usage and performance.

* Offers a memory mapped file view data source, but you can also
implement your own data source.

* It's safe to use two different iterators on the same element sequence
in two different threads.

* Full CTF metadata text inspection and validation: a parsing error
exception contains a list of locations (offset in bytes, line number,
and column number) with associated context messages.

* Full data stream packet decoding information with the
http://en.cppreference.com/w/cpp/concept/InputIterator[input iterator]
interface, for example:

** The decoded packet magic number.
** The decoded metadata stream UUID.
** The expected total length of the current packet.
** The expected content length of the current packet.
** The default clock is updated.
** The next data stream type to use.
** The next event record type to use.
** The data stream (instance) ID.

* "`Seek packet`" iterator method to seek the beginning of an element
known to be located at a specific offset, in bytes, within the same
element sequence.
+
You can use this method with non-CTF packet index information, for
example http://lttng.org/[LTTng]'s packet index files (not directly
supported by yactfr).

* Decoding error reporting with precise message, offset in element
sequence (bits) where this message applies within the element
sequence, and other relevant properties.

* Performance (`nop` iteration loop) is similar to
https://diamon.org/babeltrace/[Babeltrace]{nbsp}1.5.3's (`-o dummy`)
with a release <>.
+
I don't publish numbers: experiment by yourself to confirm or deny this
claim.

== Limitations

Note that after a few years of working with all sorts of real life CTF
traces, I never caught one which yactfr would refuse to decode, but
I'm still honest:

* Only builds on a Linux/Unix platform.
+
The non-portable part is the memory mapped file source which uses system
functions such as `open()`, `close()`, `fstat()`, `mmap()`, and
`madvise()`.

* Decodes up to, and including, 64-bit signed/unsigned CTF fixed-length
bit arrays and integers (no "`big integers`").

* Decodes up to, and including, 63-bit (effective, which means
72{nbsp}total) signed/unsigned CTF variable-length integers.

* Only decodes 32-bit and 64-bit CTF fixed-length floating point
numbers.

* Packet lengths must be multiples of 8 bits (I'm still not sure, but
this could be enforced by the specification anyway), and be at least
8{nbsp}bits.

* TSDL (CTF{nbsp}1.8): Doesn't handle single-line comment continuation
when the line ends with `\`:
+
--
----
event {
name = "hello"; // this is a comment \
id = 23; we're still in the comment started above here
id = 42;
...
};
----
--

* TSDL (CTF{nbsp}1.8): Doesn't support relative dynamic-length array
type lengths and variant type selectors in data type aliases (or named
structure/variant types) which target structure member types outside
this data type alias.
+
For example, this is not supported (TSDL):
+
--
----
fields := struct {
int len;

typealias struct {
int sequence[len];
} := my_struct;

struct {
int len;
my_struct a_struct;
} field;
};
----
--
+
This is also not supported (TSDL):
+
--
----
fields := struct {
enum {
...
} tag;

variant my_variant {
...
} a_variant;

my_variant the_variant;
};
----
--
+
The example above would work, however, if the selector location of the
variant type would be absolute:
+
--
----
fields := struct {
enum {
...
} tag;

variant my_variant {
...
} a_variant;

my_variant the_variant;
};
----
--

* API and ABI backward compatibility is not guaranteed at this point.
+
Please rebuild your project if you change the yactfr version.

[[build]]
== Build and install yactfr

Make sure you have the build time requirements:

* Linux/Unix platform
* https://cmake.org/[CMake] ≥ 3.10.0
* pass:[C++14] compiler
* http://www.boost.org/[Boost] ≥ 1.58
* **If you build the API documentation**: http://www.stack.nl/~dimitri/doxygen/[Doxygen]

.Build and install yactfr from source
----
$ git clone https://github.com/eepp/yactfr
$ cd yactfr
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=release ..
$ make
# make install
----

You can specify your favorite C and pass:[C++] compilers with the usual
`CC` and `CXX` environment variables when you run `cmake`, and
additional options with `CFLAGS` and `CXXFLAGS`.

Specify `-DOPT_BUILD_DOC=YES` to `cmake` to enable the HTML API
documentation build (requires Doxygen). The documentation is available
in `__BUILD__/doc/api/output/html`, where `__BUILD__` is your build
directory.

Specify `-DCMAKE_INSTALL_PREFIX=__PREFIX__` to `cmake` to install yactfr
to the `__PREFIX__` directory instead of the default `/usr/local`
directory.

For example, this is how I run `cmake` for development:

----
$ CC=clang CXX=clang++ CXXFLAGS='-Wextra -Wall -pedantic' \
cmake .. -DCMAKE_BUILD_TYPE=debug -DOPT_BUILD_DOC=ON
----

For production, you should make a release build:

----
$ CC=clang CXX=clang++ \
cmake .. -DCMAKE_BUILD_TYPE=release -DOPT_BUILD_DOC=ON
----

== Run the tests

Once you have <> the project in the build directory, you
can run the tests. You need Python{nbsp}3 and
https://pytest.org/[pytest].

.Run the yactfr tests from the build directory.
----
$ make check
----

If you're in a hurry and you have the
https://pypi.org/project/pytest-xdist/[pytest-xdist] package, you can
parallelize the testing process. You need to set the `YACTFR_BINARY_DIR`
environment variable to the build directory (absolute path), for
example:

.Run the yactfr tests in parallel from the build directory.
----
$ make tests
$ YACTFR_BINARY_DIR=$(pwd) pytest -n logical
----

== Usage examples

In the examples below, the program accepts two arguments:

. The path to the metadata stream file of the trace (required).

. The path to a data stream file of the same trace (required by some
example).

<> the API documentation for a thorough reference.

NOTE: The examples are not necessarily optimal: their purpose is to show
what the yactfr API looks like.

.Print all the data stream's event record names.
====
[source,cpp]
----
#include
#include
#include
#include

int main(const int argc, const char * const argv[])
{
assert(argc == 3);

// open metadata stream file
std::ifstream metadataFile {argv[1], std::ios::binary};

// create metadata stream object
const auto metadataStream = yactfr::createMetadataStream(metadataFile);

// we have the metadata text at this point: safe to close the file
metadataFile.close();

// get a trace type from the metadata text
auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

// create a memory mapped file view factory to read the data stream file
yactfr::MemoryMappedFileViewFactory factory {argv[2]};

// create an element sequence from the trace type and data source factory
yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

// print all the event record names
for (auto& elem : seq) {
if (elem.isEventRecordInfoElement()) {
auto& erInfo = elem.asEventRecordInfoElement();

// the name of an event record type is optional
if (erInfo.type()->name()) {
std::cout << *erInfo.type()->name() << std::endl;
}
}
}
}
----
====

.Print all the fixed-length signed integers of the `sched_switch` event records and their offset.
====
[source,cpp]
----
#include
#include
#include
#include

int main(const int argc, const char * const argv[])
{
assert(argc == 3);

// open metadata stream file
std::ifstream metadataFile {argv[1], std::ios::binary};

// create metadata stream object
const auto metadataStream = yactfr::createMetadataStream(metadataFile);

// we have the metadata text at this point: safe to close the file
metadataFile.close();

// get a trace type from the metadata text
auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

// create a memory mapped file view factory to read the data stream file
yactfr::MemoryMappedFileViewFactory factory {argv[2]};

// create an element sequence from the trace type and data source factory
yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

// print all the fixed-length signed integers of the `sched_switch` ERs
const auto endIt = seq.end();
bool inSchedSwitchEventRecord = false;

for (auto it = seq.begin(); it != endIt; ++it) {
if (it->isEventRecordInfoElement()) {
auto& ertElem = it->asEventRecordInfoElement();

// the name of an event record type is optional
if (ertElem.type()->name() && *ertElem.type()->name() == "sched_switch") {
std::cout << "---" << std::endl;
inSchedSwitchEventRecord = true;
} else {
inSchedSwitchEventRecord = false;
}

continue;
}

if (inSchedSwitchEventRecord && it->isFixedLengthSignedIntegerElement()) {
std::cout << it.offset() << ": ";

auto& intElem = it->asFixedLengthSignedIntegerElement();

if (intElem.structureMemberType()) {
std::cout << intElem.structureMemberType()->displayName() << ": ";
}

std::cout << intElem.value() << std::endl;
}
}
}
----
====

.Print all the packet offsets and lengths (both in bits): slow version.
====
In this example, we iterate _all_ the elements of the data stream. The
next example shows how to do the same faster.

[source,cpp]
----
#include
#include
#include
#include
#include

int main(const int argc, const char * const argv[])
{
assert(argc == 3);

// open metadata stream file
std::ifstream metadataFile {argv[1], std::ios::binary};

// create metadata stream object
const auto metadataStream = yactfr::createMetadataStream(metadataFile);

// we have the metadata text at this point: safe to close the file
metadataFile.close();

// get a trace type from the metadata text
auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

// create a memory mapped file view factory to read the data stream file
yactfr::MemoryMappedFileViewFactory factory {argv[2]};

// create an element sequence from the trace type and data source factory
yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

// print all the packet offsets and lengths (both in bits)
const auto endIt = seq.end();
yactfr::Index curPktOffset = 0;
unsigned long curPktNumber = 0;

for (auto it = seq.begin(); it != endIt; ++it) {
if (it->isPacketBeginningElement()) {
// save packet beginning offset
curPktOffset = it.offset();
} else if (it->isPacketEndElement()) {
// back to first level: end of packet
const auto pktLen = it.offset() - curPktOffset;

std::cout << "Packet #" << curPktNumber << ": " <<
"Offset: " << std::setw(10) << curPktOffset << " " <<
"Size: " << std::setw(10) << pktLen <<
std::endl;
++curPktNumber;
}
}
}
----
====

.Print all the packet offsets and lengths (both in bits): fast version.
====
This is a faster version of the previous example.

Instead of decoding the whole packet to find its length, we use the
expected packet total length property of the "`packet info`" element.
This element is available after the decoder reads the packet context.
Then, we make the iterator seek the next packet directly.

Note that this example doesn't work if the packet context type doesn't
contain an expected packet total length fixed-length unsigned integer,
in which case the data stream _must_ contain a single packet. This could
be detected by inspecting the metadata (trace type) and using the size
of the whole data stream file as the unique packet total length.

[source,cpp]
----
#include
#include
#include
#include
#include

int main(const int argc, const char * const argv[])
{
assert(argc == 3);

// open metadata stream file
std::ifstream metadataFile {argv[1], std::ios::binary};

// create metadata stream object
const auto metadataStream = yactfr::createMetadataStream(metadataFile);

// we have the metadata text at this point: safe to close the file
metadataFile.close();

// get a trace type from the metadata text
auto traceTypeMsUuidPair = yactfr::fromMetadataText(metadataStream->text());

// create a memory mapped file view factory to read the data stream file
yactfr::MemoryMappedFileViewFactory factory {argv[2]};

// create an element sequence from the trace type and data source factory
yactfr::ElementSequence seq {*traceTypeMsUuidPair.first, factory};

// print all the packet offsets and lengths (both in bits)
const auto endIt = seq.end();
auto it = seq.begin();
yactfr::Index curPktOffset = 0;
unsigned long curPktNumber = 0;

while (it != endIt) {
if (it->isPacketBeginningElement()) {
// save packet beginning offset
curPktOffset = it.offset();
} else if (it->isPacketInfoElement()) {
// this element contains the expected total length of the current packet
auto& elem = it->asPacketInfoElement();

assert(elem.expectedTotalLength());
std::cout << "Packet #" << curPktNumber << ": " <<
"Offset: " << std::setw(10) << curPktOffset << " " <<
"Size: " << std::setw(10) << *elem.expectedTotalLength() <<
std::endl;
++curPktNumber;

/*
* Seek the next packet without iterating the intermediate
* elements. The expected offset is in bytes, so we need to
* divide what we have by 8.
*/
it.seekPacket((curPktOffset + *elem.expectedTotalLength()) / 8);
continue;
}

++it;
}
}
----
====

== Contribute and report bugs

Please contribute with GitHub pull requests and report bugs as GitHub
issues.

== Community

See https://eepp.ca/[eepp.ca].

I'm `eepp` on https://libera.chat/[Libera.Chat] and
https://oftc.net/[OFTC].