An open API service indexing awesome lists of open source software.

https://github.com/buoyant-data/hotdog

Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into Apache Kafka as quickly as possible.
https://github.com/buoyant-data/hotdog

async kafka rust syslog

Last synced: about 2 months ago
JSON representation

Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into Apache Kafka as quickly as possible.

Awesome Lists containing this project

README

          

ifdef::env-github[]
:tip-caption: :bulb:
:note-caption: :information_source:
:important-caption: :heavy_exclamation_mark:
:caution-caption: :fire:
:warning-caption: :warning:
endif::[]
:toc: macro

= 🌭 Hotdog!

Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into
link:https://kafka.apache.org[Apache Kafka]
as quickly as possible.

It listens for syslog messages over plaintext or TLS-encrypted TCP connection
and depending on the defined <> it will route and even modify messages
on their way into a <>.

toc::[]

== Features

* syslog over plaintext or TLS-encrypted TCP connections.
* <> and <> for matching, modifying, and routing syslog
messages based on the message content.
* Rich integration with Kafka with <> for
link:https://github.com/edenhill/librdkafka[librdkafka]
* Built-in <> for daemon health reporting.

[source,bash]
----
Hotdog 1.0.0
R Tyler Croy
Forward syslog with ease

USAGE:
hotdog [OPTIONS]

FLAGS:
-h, --help Prints help information
-V, --version Prints version information

OPTIONS:
-c, --config Sets a custom config file [default: hotdog.yml]
-t, --test Test a log file against the configured rules
----

[[install]]
== Installation

Hotdog can be installed by grabbing a
link:https://github.com/reiseburo/hotdog/releases[released binary].
The system which will run `hotdog` *must* have `libsasl2` installed, e.g.:

.Ubuntu
[source,bash]
----
sudo apt-get install libsasl2-2
----

.openSUSE
[source,bash]
----
sudo zypper install cyrus-sasl-devel
----

[[performance]]
=== Performance

By default `hotdog` will run with a single background thread for processing
incoming messages. It is recommended to set `SMOL_THREADS` to the number of
CPUs which should be utilized on the machine.

[[configuration]]
== Configuration

Hotdog is configured by the `hotdog.yml` file, which has a very fluid syntax at
the moment. The two main sections are the `global` and `rules` blocks.

Rules defined in the configuration can be tested against an example log file in
order to verify that the right rules are matching the expected log inputs, for
example:

[source,bash]
----
❯ RUST_LOG=info ./target/debug/hotdog -t example.log
Line 1 matches on:
- Regex: ^hello\s+(?P\w+)?
- Regex: .*
Line 2 matches on:
- Regex: .*
Line 3 matches on:
- Regex: .*
Line 4 matches on:
- JMESPath: meta.topic
- Regex: .*
----

[[global]]
=== Global

The `global` configuration configures `hotdog` itself. The <>, <>, and <> keys are all
required by default in order for `hotdog` to start properly.

[[yml-listen]]
==== Listen

The `global.listen` configuration is required and will determine on which
address and port `hotdog` will listen. The <>
configuration key is required to function as well. When `tls` is left blank,
`hotdog` will listen for syslog messages in plaintext on the specified `port`.

.hotdog.yml
[source,yaml]
----
global:
listen:
address: '127.0.0.1'
port: 1514
tls:
----

[[yml-listen-tls]]
===== TLS

The `global.listen.tls` configuration section can be used to enable
syslog-over-TLS support from `hotdog`. Currently the only two valid keys for
this section are `cert` and `key`, both of which should be absolute or relative
paths to PEM-encoded files on disk.

Certificate and Key files can be created with `certtool --generate-privkey
--outfile ca-key.pem`

.hotdog.yml
[source,yaml]
----
global:
listen:
tls:
cert: './a/path.crt'
key: './a/path.key'
# ca is optional and when provided will ensure certificate validation
# happens
ca: './a/ca.crt'
----

[[yml-status]]
==== Status

The `global.status` is an optional configuration entry which will enable the
launching of an HTTP status server on the specified `addresss` and `port`.

JSON formatted statistics can be retrieved on `/stats`.

.hotdog.yml
[source,yaml]
----
global:
status:
address: '127.0.0.1'
port: 8585
----

[[yml-kafka]]
==== Kafka

A `global.kafka` configuration is required in order for `hotdog` to function
properly. The two main configuration values are <> and <>.

.hotdog.yml
[source,yaml]
----
global:
kafka:
conf:
bootstrap.servers: 'localhost:9092'
client.id: 'hotdog'
topic: 'logs'
----

[[yml-kafka-buffer]]
===== Buffer

**Default:** `1024`

`global.kafka.buffer` may contain a number indicating the size of the internal
queue for sending messages to Kafka. This queue represents the number of
internal messages `hotdog` will buffer during Kafka availability issues.

This value is *not* the same as the librdkafka `queue.buffering.max.messages`
configuration, which governs the number of in-flight messages which can be sent
at any given time to the Kafka broker(s). To set that variable, include it in
the <> section documented below.

[CAUTION]
====
If the internal Kafka queue has been filled up, new log lines received by
`hotdog` will be discarded.
====

[[yml-kafka-conf]]
===== Conf

`global.kafka.conf` should contain a map of
link:https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md[librdkafka configuration values].
`hotdog` will expect every key _and_ value to be a String. These configuration
values are passed right on to the underlying librdkafka client connection, so
whatever librdkafka supports, `hotdog` supports!

[[yml-kafka-timeout_ms]]
===== timeout_ms

**Default:** `30_000`

`global.kafka.timeout_ms` is an optional configuration which defines the
timeout in milliseconds for `hotdog` to make an initial connection to the
configured Kafka brokers.

[[yml-kafka-topic]]
===== Topic

`global.kafka.topic` may contain a string value which is to be considered the
"default topic" for the <>.

[[yml-parquet]]
==== Parquet

The link:https://parquet.apache.org[Apache Parquet] sink allows for directly
writing to an
link:https://docs.rs/object_store/latest/object_store/index.html[object_store]
supported `url`

[source,yaml]
----
global:
parquet:
url: 's3://hotdog/streams/'
# Bytes to buffer
buffer: 1024000
flush_ms: 60000
----

[TIP]
====
The `url` can be omitted from the configuration and specified in the environment via `S3_OUTPUT_URL`
====

[[yml-metrics]]
==== Metrics

The `global.metrics` configuration tells `hotdog` where to send its own
internal metrics The only _currently_ supported metrics format is
link:https://github.com/statsd/statsd[statsd].

If your environment doesn't use statsd or you do not wish to report metrics,
set the `statsd` value to an invalid host and port.

.hotdog.yml
[source,yaml]
----
global:
metrics:
statsd: 'localhost:8125'
----

[[yml-status]]
==== Status

The `global.status` configuration is fully _optional_ but when it is enabled `hotdog`
will spin up an HTTP server on the configured `address` and `port` in order to provide
real-time status information about the daemon's runtime to HTTP clients.

.hotdog.yml
[source,yaml]
----
global:
status:
address: 'localhost'
port: 8585
----

[[rules]]
=== Rules

Hotdog's rules define how it should handle and route the syslog messages it
receives. In the `hotdog.yml`, the rules must be defined as an array of maps.

Each rule is expected to a "matcher" (either <> or
<>), the `field` upon which the matcher should
apply, and the <> defining how the message should be
handled.

.hotdog.yml
[source,yaml]
----
rules:
- jmespath: 'meta.topic'
field: msg
actions:
- type: forward
topic: '{{value}}'

# Catch-all, send everything else to a "logs-unknown" topic
- regex: '.*'
field: msg
actions:
- type: forward
topic: 'logs-unknown'
----

.Supported Fields
|===
| Name | Notes

| `msg`
| The actual message sent along from the syslog server

| `hostname`
| The sender's hostname, if available.

| `appname`
| The logging application, if available, which created the syslog entry

| `facility`
| The syslog logging facility, if available, which was used to create the syslog message. For example `kern`, `user`, `auth`, etc.

| `severity`
| The severity of the syslog message, if available. For example: `notice`, `err`, `crit`, etc.

|===

[[rules-regex]]
==== Matching with regular expressions

The `regex` matcher instructs `hotdog` to match the `field` against the defined
regular expression, which must follow the syntax of the
link:https://docs.rs/regex/1.3.7/regex/#syntax[regex crate].

The matcher supports named groups in the regular expression, which are then exposed to actions such as
<> and <>.

[CAUTION]
====
Named groups will **override** any built-in variables at the time of
substitution, so be careful you are not naming your groups anything which might
overlap with the built-in variable names
====

[[rules-jmespath]]
==== Matching with JMESPath

`hotdog` also supports matching on JSON based messages with
link:https://jmespath.org/[JMESPath] via the `jmespath` matcher. In order for a
match, the log message must be a valid JSON object or array. The value of the
match is also then exposed as a <> named `value`, which
can be used in actions such as <> or <>.

[[variables]]
==== Variables

Some actions, such as <>, can perform variable substitutions on
log line. The variables available are a combination of the built-in variables
listed below, and whatever named groups exist in the `regex` field of the
<>.

[[builtin-vars]]
.Built-in Variables
|===
| Name | Description

| `msg`
| The original log line message sent along from the syslog sender.

| `version`
| The version of `hotdog` which is processing the message.

| `iso8601`
| The ISO-8601 timestamp of when the message was processed.

|===

[[actions]]
==== Actions

Actions determine what `hotdog` should do with the given log line when it
receives it.

[[action-forward]]
===== Forward

The forward action implies the <> when used, since
the internally tracked `output` buffer is flushed when it is sent to Kafka.

[[action-merge]]
===== Merge

The `merge` action will only work when the log line is a JSON **object**. JSON
arrays, or other arbitrary strings will not merge properly, and cause **all**
subsequent actions for the given rule to be aborted.

.Parameters
|===
| Key | Value

| `json`
| A YAML map which will be merged with the JSON object deserialized from the matched log line.

|===

.hotdog.yml
[source,yaml]
----
actions:
- type: merge
json:
meta:
hotdog:
version: '{{version}}'
timestamp: '{{iso8601}}'
----

[[action-replace]]
===== Replace

The `template` may utilize the <> in
order to generate a modified message. The output is only available to
subsequent actions defined _after_ the `replace` action. Subsequent rules in
the chain **will not** utilize this generated message.

.Parameters
|===
| Key | Value

| `template`
| A link:https://handlebarsjs.com/[Handlebars]-style template which can be used to output a modified message.

|===

.hotdog.yml
[source,yaml]
----
- regex: '^hello\s+(?P\w+)?'
actions:
- type: replace
template: |
Why hello there {{name}}!
----

[[action-stop]]
===== Stop

The `stop` action does nothing more than stop processing on the message. It is
not particularly useful except in cases where `hotdog` should match on a
message and then effectively discard it.

[[metrics]]
== Metrics

`hotdog` is designed to emit Statsd metrics to the statsd endpoint configured
in the <> section. Each metric will be prefixed under `hotdog.*`.

|===
| Key | Description

| `hotdog.connections`
| Gauge tracking the number of connections

| `hotdog.lines`
| Counter tracking the number of lines received by `hotdog`

| `hotdog.kafka.submitted`
| Counter tracking the number of messages submitted to Kafka

| `hotdog.kafka.submitted.`
| Counter tracking the number of messages submitted to each Kafka topic

| `hotdog.kafka.producer.sent`
| Timer which tracks the amount of time it takes to actually write messages to Kafka

| `hotdog.kafka.producer.error.*`
| Counters which count the number of different errors encountered while sending messages to Kafka. The types of possible metric names depends on the link:https://docs.rs/rdkafka/0.23.1/rdkafka/error/enum.RDKafkaError.html[RDKafkaError] enumeration from the underlying library.

| `hotdog.error.log_parse`
| Number of the log lines received which could not be parsed as link:https://tools.ietf.org/html/rfc5424[RFCC 5424] syslog lines.

| `hotdog.error.full_internal_queue`
| Count tracking the number of log lines which were *dropped* due to a full internal queue, Typically indicates an issue between `hotdog` and the Kafka brokers.

| `hotdog.error.internal_push_failed`
| Number of lines dropped because the could not be sent into the internal queue.

| `hotdog.error.topic_parse_failed`
| Number of lines dropped because the configured dynamic topic could not be parsed properly (typically indicates a configuration error).

| `hotdog.error.merge_of_invalid_json`
| Count of lines which could not have a merge action applied as configured due to a configuration error

| `hotdog.error.merge_target_not_json`
| Count of lines received for a merge action which were not JSON, and therefore could not be merged.

|===

[[development]]
== Development

Hotdog is tested against the latest Rust stable. A simple `cargo build` should
compile a working `hotdog` binary for your platform.

On Linux systems it is easy to test with:

[source,bash]
----
logger --server 127.0.0.1 -T -P 1514 "hello world"
logger --server 127.0.0.1 -T -P 1514 -f example.log
----

For TLS connections, you can use the `openssl` `s_client` command:

[source,bash]
----
echo '<13>1 2020-04-18T15:16:09.956153-07:00 coconut tyler - - [timeQuality tzKnown="1" isSynced="1" syncAccuracy="505061"] hello world' | openssl s_client -connect localhost:6514
----

=== Profiling

Profiling `hotdog` is best done on a Linux host with the `perf` tool, e.g.

[source,bash]
----
RUST_LOG=info perf record --call-graph dwarf -- ./target/debug/hotdog -c ./hotdog.yml
perf report -ng --no-inline
----

By default this may run with a single thread, to increase the parallelism of
:hotdog: while profiling, be sure to use the `SMOL_THREADS` environment
variable.

The [hotspot](https://github.com/KDAB/hotspot) profiler visualizer tool works
well with the generated repors.

== Similar Projects

`hotdog` was originally motivated by challenges with
link:https://github.com/rsyslog/rsyslog[rsyslog], a desire for a simple
configuration, and the need for built-in metrics.

Some other similar projects which can be used to get logs into Kafka:

* link:https://github.com/elastic/logstash[logstash]
* link:https://github.com/syslog-ng/syslog-ng[syslog-ng]
* link:https://github.com/timberio/vector[vector]
* link:https://github.com/uswitch/syslogger[syslogger], which doesn't process
messages itself, but rather integrates with `rsyslog`.