Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ess-dmsc/kafka-to-nexus

(Moved to https://gitlab.esss.lu.se/ecdc/ess-dmsc/kafka-to-nexus) Writes NeXus files from experiment data streamed through Apache Kafka.
https://github.com/ess-dmsc/kafka-to-nexus

conan filewriter hdf5 kafka-topic

Last synced: about 1 month ago
JSON representation

(Moved to https://gitlab.esss.lu.se/ecdc/ess-dmsc/kafka-to-nexus) Writes NeXus files from experiment data streamed through Apache Kafka.

Awesome Lists containing this project

README

        

[![pipeline status](https://gitlab.esss.lu.se/ecdc/ess-dmsc/kafka-to-nexus/badges/main/pipeline.svg)](https://gitlab.esss.lu.se/ecdc/ess-dmsc/kafka-to-nexus/-/commits/main)
[![coverage report](https://gitlab.esss.lu.se/ecdc/ess-dmsc/kafka-to-nexus/badges/main/coverage.svg)](https://gitlab.esss.lu.se/ecdc/ess-dmsc/kafka-to-nexus/-/commits/main)
[![DOI](https://zenodo.org/badge/81435658.svg)](https://zenodo.org/badge/latestdoi/81435658)

# Kafka to Nexus File-Writer

Writes NeXus files from experiment data streamed through Apache Kafka.
Part of the ESS data streaming pipeline.

## Usage

```
-h,--help Print this help message and exit
--version Print application version and exit
--brokers TEXT Comma separated list of Kafka brokers.
--command-status-topic TEXT REQUIRED
Kafka topic to
listen for commands and to push status updates
to
--job-pool-topic TEXT REQUIRED Kafka topic to
listen for jobs
--graylog-logger-address URI
Log to Graylog via graylog_logger
library
--grafana-carbon-address URI
Address to the Grafana (Carbon)
metrics service.
-v,--verbosity Set log message level. Set to 0 - 5 or one of
`Debug`, `Info`, `Warning`, `Error`
or `Critical`. Ex: "-v Debug". Default: `Error`
--hdf-output-prefix TEXT Directory which
gets prepended to the HDF output filenames in
the file write commands
--log-file TEXT Specify file to log to
--service-name [kafka-to-nexus:CI0021385-pid:18632-940b]
Used to generate the service identifier and as an
extra metrics ID string.Will make the metrics
names take the form:
"kafka-to-nexus.[host-name].[service-name].*"
--list_modules List registered read and writer parts of
file-writing modules and then exit.
--status-master-interval Interval between status updates. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--time-before-start Pre-consume messages this amount of time. Ex.
"10s". Accepts "h", "m", "s" and "ms".
--time-after-stop Allow for this much leeway after stop time before
stopping message consumption. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--kafka-metadata-max-timeout
Max timeout for kafka metadata calls. Note:
metadata calls block the application. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--kafka-error-timeout Amount of time to wait for recovery from kafka
error before abandoning stream. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--kafka-poll-timeout Amount of time to wait for new kafka message.
*WARNING* Should generally not be changed from
the default. Increase the
"--kafka-error-timeout" instead. Ex. "10s".
Accepts "h", "m", "s" and "ms".
--data-flush-interval (Max) amount of time between flushing of data to
file, in seconds. Ex. "10s". Accepts "h", "m",
"s" and "ms".
--max-queued-writes Maximum number of messages buffered for writing.
Directly affects the memory usage of the
application. The maximum is not enforced, only
used as guideline to throttle Kafka consumption.
Note that total memory usage will also depend on
the size of the actual messages consumed from
Kafka.
-X,--kafka-config KEY VALUE ...
LibRDKafka options
-c,--config-file Read configuration from an ini file
```

### Configuration Files

The file-writer can be configured from a file via `--config-file ` which mirrors the command line options.

For example:

```ini
brokers=localhost:9092
command-status-topic=command-topic
job-pool-topic=job-pool-topic
hdf-output-prefix=./absolute/or/relative/path/to/hdf/output/directory
service-name=this_is_filewriter_instance_HOST_PID_EXAMPLENAME
streamer-ms-before-start=123456
kafka-config=consumer.timeout.ms 501 fetch.message.max.bytes 1234 api.version.request true
```

Note: the Kafka options are key-value pairs and the file-writer can be given multiple such by appending the key-value pair to
the end of the command line option.

### Sending commands to the file-writer

Beyond the configuration options given at start-up, the file-writer can be sent commands via Kafka to control the actual file writing.

See [commands](documentation/commands.md) for more information.

## Installation

The supported method for installation is via Conan.

### Prerequisites

The following minimum software is required to get started:

- Linux or MacOS
- Conan
- CMake >= 3.1.0
- Git
- A C++17 compatible compiler (preferably GCC or Clang).
GCC >=8 and AppleClang >=10 are good enough; complete C++17 support is not required.
- Doxygen (only required if you would like to generate the documentation)

Conan will install all the other required packages.

### Add the Conan remote repositories

Follow the README [here](https://github.com/ess-dmsc/conan-configuration).

If you are not building on CentOS 7, set compiler.libcxx to libstdc++11 in the settings section of your Conan profile:

```bash
# Assuming profile is named "default"
conan profile update settings.compiler.libcxx=libstdc++11 default
```

### Build

From within the file-writer's top directory:

```bash
mkdir _build
cd _build
conan install .. --build=missing
cmake ..
make
```

There are additional CMake flags for adjusting the build:
* `-DRUN_DOXYGEN=ON` if Doxygen documentation is required. Also, requires `make docs` to be run afterwards
* `-DHTML_COVERAGE_REPORT=ON` to generate a html unit test coverage report, output to `/coverage/index.html`

### Building and running the unit tests

From the build directory:

```bash
make UnitTests
./bin/UnitTests
```

### Running on OSX

When using Conan on OSX, due to the way paths to dependencies are handled,
the `activate_run.sh` file may need to be sourced before running the application. The
`deactivate_run.sh` can be sourced to undo the changes afterwards.

### Domain tests

These tests run the core functionality of the program but without Kafka. This means the tests are more stable than the integration
tests as they do not require Docker and Kafka to be running.

### Integration tests

The integration tests consist of a series of automated tests for this repository that test it in ways similar to how it would
be used in production.

See [Integration Tests page](integration-tests/README.md) for more information.

## Documentation

See the `documentation` directory.

## Contributing

Please read [CONTRIBUTING.md](CONTRIBUTING.md) for information on submitting pull requests to this project.

## License

This project is licensed under the BSD 2-Clause "Simplified" License - see the [LICENSE.md](LICENSE.md) file for details.