https://github.com/buoyant-data/hotdog
Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into Apache Kafka as quickly as possible.
https://github.com/buoyant-data/hotdog
async kafka rust syslog
Last synced: about 2 months ago
JSON representation
Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into Apache Kafka as quickly as possible.
- Host: GitHub
- URL: https://github.com/buoyant-data/hotdog
- Owner: buoyant-data
- License: agpl-3.0
- Created: 2020-04-15T20:25:25.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2025-05-02T17:15:01.000Z (12 months ago)
- Last Synced: 2025-10-14T11:17:26.778Z (6 months ago)
- Topics: async, kafka, rust, syslog
- Language: Rust
- Homepage:
- Size: 511 KB
- Stars: 47
- Watchers: 2
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
ifdef::env-github[]
:tip-caption: :bulb:
:note-caption: :information_source:
:important-caption: :heavy_exclamation_mark:
:caution-caption: :fire:
:warning-caption: :warning:
endif::[]
:toc: macro
= 🌭 Hotdog!
Hotdog is a syslog-to-Kafka forwarder which aims to get log entries into
link:https://kafka.apache.org[Apache Kafka]
as quickly as possible.
It listens for syslog messages over plaintext or TLS-encrypted TCP connection
and depending on the defined <> it will route and even modify messages
on their way into a <>.
toc::[]
== Features
* syslog over plaintext or TLS-encrypted TCP connections.
* <> and <> for matching, modifying, and routing syslog
messages based on the message content.
* Rich integration with Kafka with <> for
link:https://github.com/edenhill/librdkafka[librdkafka]
* Built-in <> for daemon health reporting.
[source,bash]
----
Hotdog 1.0.0
R Tyler Croy
Forward syslog with ease
USAGE:
hotdog [OPTIONS]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-c, --config Sets a custom config file [default: hotdog.yml]
-t, --test Test a log file against the configured rules
----
[[install]]
== Installation
Hotdog can be installed by grabbing a
link:https://github.com/reiseburo/hotdog/releases[released binary].
The system which will run `hotdog` *must* have `libsasl2` installed, e.g.:
.Ubuntu
[source,bash]
----
sudo apt-get install libsasl2-2
----
.openSUSE
[source,bash]
----
sudo zypper install cyrus-sasl-devel
----
[[performance]]
=== Performance
By default `hotdog` will run with a single background thread for processing
incoming messages. It is recommended to set `SMOL_THREADS` to the number of
CPUs which should be utilized on the machine.
[[configuration]]
== Configuration
Hotdog is configured by the `hotdog.yml` file, which has a very fluid syntax at
the moment. The two main sections are the `global` and `rules` blocks.
Rules defined in the configuration can be tested against an example log file in
order to verify that the right rules are matching the expected log inputs, for
example:
[source,bash]
----
❯ RUST_LOG=info ./target/debug/hotdog -t example.log
Line 1 matches on:
- Regex: ^hello\s+(?P\w+)?
- Regex: .*
Line 2 matches on:
- Regex: .*
Line 3 matches on:
- Regex: .*
Line 4 matches on:
- JMESPath: meta.topic
- Regex: .*
----
[[global]]
=== Global
The `global` configuration configures `hotdog` itself. The <>, <>, and <> keys are all
required by default in order for `hotdog` to start properly.
[[yml-listen]]
==== Listen
The `global.listen` configuration is required and will determine on which
address and port `hotdog` will listen. The <>
configuration key is required to function as well. When `tls` is left blank,
`hotdog` will listen for syslog messages in plaintext on the specified `port`.
.hotdog.yml
[source,yaml]
----
global:
listen:
address: '127.0.0.1'
port: 1514
tls:
----
[[yml-listen-tls]]
===== TLS
The `global.listen.tls` configuration section can be used to enable
syslog-over-TLS support from `hotdog`. Currently the only two valid keys for
this section are `cert` and `key`, both of which should be absolute or relative
paths to PEM-encoded files on disk.
Certificate and Key files can be created with `certtool --generate-privkey
--outfile ca-key.pem`
.hotdog.yml
[source,yaml]
----
global:
listen:
tls:
cert: './a/path.crt'
key: './a/path.key'
# ca is optional and when provided will ensure certificate validation
# happens
ca: './a/ca.crt'
----
[[yml-status]]
==== Status
The `global.status` is an optional configuration entry which will enable the
launching of an HTTP status server on the specified `addresss` and `port`.
JSON formatted statistics can be retrieved on `/stats`.
.hotdog.yml
[source,yaml]
----
global:
status:
address: '127.0.0.1'
port: 8585
----
[[yml-kafka]]
==== Kafka
A `global.kafka` configuration is required in order for `hotdog` to function
properly. The two main configuration values are <> and <>.
.hotdog.yml
[source,yaml]
----
global:
kafka:
conf:
bootstrap.servers: 'localhost:9092'
client.id: 'hotdog'
topic: 'logs'
----
[[yml-kafka-buffer]]
===== Buffer
**Default:** `1024`
`global.kafka.buffer` may contain a number indicating the size of the internal
queue for sending messages to Kafka. This queue represents the number of
internal messages `hotdog` will buffer during Kafka availability issues.
This value is *not* the same as the librdkafka `queue.buffering.max.messages`
configuration, which governs the number of in-flight messages which can be sent
at any given time to the Kafka broker(s). To set that variable, include it in
the <> section documented below.
[CAUTION]
====
If the internal Kafka queue has been filled up, new log lines received by
`hotdog` will be discarded.
====
[[yml-kafka-conf]]
===== Conf
`global.kafka.conf` should contain a map of
link:https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md[librdkafka configuration values].
`hotdog` will expect every key _and_ value to be a String. These configuration
values are passed right on to the underlying librdkafka client connection, so
whatever librdkafka supports, `hotdog` supports!
[[yml-kafka-timeout_ms]]
===== timeout_ms
**Default:** `30_000`
`global.kafka.timeout_ms` is an optional configuration which defines the
timeout in milliseconds for `hotdog` to make an initial connection to the
configured Kafka brokers.
[[yml-kafka-topic]]
===== Topic
`global.kafka.topic` may contain a string value which is to be considered the
"default topic" for the <>.
[[yml-parquet]]
==== Parquet
The link:https://parquet.apache.org[Apache Parquet] sink allows for directly
writing to an
link:https://docs.rs/object_store/latest/object_store/index.html[object_store]
supported `url`
[source,yaml]
----
global:
parquet:
url: 's3://hotdog/streams/'
# Bytes to buffer
buffer: 1024000
flush_ms: 60000
----
[TIP]
====
The `url` can be omitted from the configuration and specified in the environment via `S3_OUTPUT_URL`
====
[[yml-metrics]]
==== Metrics
The `global.metrics` configuration tells `hotdog` where to send its own
internal metrics The only _currently_ supported metrics format is
link:https://github.com/statsd/statsd[statsd].
If your environment doesn't use statsd or you do not wish to report metrics,
set the `statsd` value to an invalid host and port.
.hotdog.yml
[source,yaml]
----
global:
metrics:
statsd: 'localhost:8125'
----
[[yml-status]]
==== Status
The `global.status` configuration is fully _optional_ but when it is enabled `hotdog`
will spin up an HTTP server on the configured `address` and `port` in order to provide
real-time status information about the daemon's runtime to HTTP clients.
.hotdog.yml
[source,yaml]
----
global:
status:
address: 'localhost'
port: 8585
----
[[rules]]
=== Rules
Hotdog's rules define how it should handle and route the syslog messages it
receives. In the `hotdog.yml`, the rules must be defined as an array of maps.
Each rule is expected to a "matcher" (either <> or
<>), the `field` upon which the matcher should
apply, and the <> defining how the message should be
handled.
.hotdog.yml
[source,yaml]
----
rules:
- jmespath: 'meta.topic'
field: msg
actions:
- type: forward
topic: '{{value}}'
# Catch-all, send everything else to a "logs-unknown" topic
- regex: '.*'
field: msg
actions:
- type: forward
topic: 'logs-unknown'
----
.Supported Fields
|===
| Name | Notes
| `msg`
| The actual message sent along from the syslog server
| `hostname`
| The sender's hostname, if available.
| `appname`
| The logging application, if available, which created the syslog entry
| `facility`
| The syslog logging facility, if available, which was used to create the syslog message. For example `kern`, `user`, `auth`, etc.
| `severity`
| The severity of the syslog message, if available. For example: `notice`, `err`, `crit`, etc.
|===
[[rules-regex]]
==== Matching with regular expressions
The `regex` matcher instructs `hotdog` to match the `field` against the defined
regular expression, which must follow the syntax of the
link:https://docs.rs/regex/1.3.7/regex/#syntax[regex crate].
The matcher supports named groups in the regular expression, which are then exposed to actions such as
<> and <>.
[CAUTION]
====
Named groups will **override** any built-in variables at the time of
substitution, so be careful you are not naming your groups anything which might
overlap with the built-in variable names
====
[[rules-jmespath]]
==== Matching with JMESPath
`hotdog` also supports matching on JSON based messages with
link:https://jmespath.org/[JMESPath] via the `jmespath` matcher. In order for a
match, the log message must be a valid JSON object or array. The value of the
match is also then exposed as a <> named `value`, which
can be used in actions such as <> or <>.
[[variables]]
==== Variables
Some actions, such as <>, can perform variable substitutions on
log line. The variables available are a combination of the built-in variables
listed below, and whatever named groups exist in the `regex` field of the
<>.
[[builtin-vars]]
.Built-in Variables
|===
| Name | Description
| `msg`
| The original log line message sent along from the syslog sender.
| `version`
| The version of `hotdog` which is processing the message.
| `iso8601`
| The ISO-8601 timestamp of when the message was processed.
|===
[[actions]]
==== Actions
Actions determine what `hotdog` should do with the given log line when it
receives it.
[[action-forward]]
===== Forward
The forward action implies the <> when used, since
the internally tracked `output` buffer is flushed when it is sent to Kafka.
[[action-merge]]
===== Merge
The `merge` action will only work when the log line is a JSON **object**. JSON
arrays, or other arbitrary strings will not merge properly, and cause **all**
subsequent actions for the given rule to be aborted.
.Parameters
|===
| Key | Value
| `json`
| A YAML map which will be merged with the JSON object deserialized from the matched log line.
|===
.hotdog.yml
[source,yaml]
----
actions:
- type: merge
json:
meta:
hotdog:
version: '{{version}}'
timestamp: '{{iso8601}}'
----
[[action-replace]]
===== Replace
The `template` may utilize the <> in
order to generate a modified message. The output is only available to
subsequent actions defined _after_ the `replace` action. Subsequent rules in
the chain **will not** utilize this generated message.
.Parameters
|===
| Key | Value
| `template`
| A link:https://handlebarsjs.com/[Handlebars]-style template which can be used to output a modified message.
|===
.hotdog.yml
[source,yaml]
----
- regex: '^hello\s+(?P\w+)?'
actions:
- type: replace
template: |
Why hello there {{name}}!
----
[[action-stop]]
===== Stop
The `stop` action does nothing more than stop processing on the message. It is
not particularly useful except in cases where `hotdog` should match on a
message and then effectively discard it.
[[metrics]]
== Metrics
`hotdog` is designed to emit Statsd metrics to the statsd endpoint configured
in the <> section. Each metric will be prefixed under `hotdog.*`.
|===
| Key | Description
| `hotdog.connections`
| Gauge tracking the number of connections
| `hotdog.lines`
| Counter tracking the number of lines received by `hotdog`
| `hotdog.kafka.submitted`
| Counter tracking the number of messages submitted to Kafka
| `hotdog.kafka.submitted.`
| Counter tracking the number of messages submitted to each Kafka topic
| `hotdog.kafka.producer.sent`
| Timer which tracks the amount of time it takes to actually write messages to Kafka
| `hotdog.kafka.producer.error.*`
| Counters which count the number of different errors encountered while sending messages to Kafka. The types of possible metric names depends on the link:https://docs.rs/rdkafka/0.23.1/rdkafka/error/enum.RDKafkaError.html[RDKafkaError] enumeration from the underlying library.
| `hotdog.error.log_parse`
| Number of the log lines received which could not be parsed as link:https://tools.ietf.org/html/rfc5424[RFCC 5424] syslog lines.
| `hotdog.error.full_internal_queue`
| Count tracking the number of log lines which were *dropped* due to a full internal queue, Typically indicates an issue between `hotdog` and the Kafka brokers.
| `hotdog.error.internal_push_failed`
| Number of lines dropped because the could not be sent into the internal queue.
| `hotdog.error.topic_parse_failed`
| Number of lines dropped because the configured dynamic topic could not be parsed properly (typically indicates a configuration error).
| `hotdog.error.merge_of_invalid_json`
| Count of lines which could not have a merge action applied as configured due to a configuration error
| `hotdog.error.merge_target_not_json`
| Count of lines received for a merge action which were not JSON, and therefore could not be merged.
|===
[[development]]
== Development
Hotdog is tested against the latest Rust stable. A simple `cargo build` should
compile a working `hotdog` binary for your platform.
On Linux systems it is easy to test with:
[source,bash]
----
logger --server 127.0.0.1 -T -P 1514 "hello world"
logger --server 127.0.0.1 -T -P 1514 -f example.log
----
For TLS connections, you can use the `openssl` `s_client` command:
[source,bash]
----
echo '<13>1 2020-04-18T15:16:09.956153-07:00 coconut tyler - - [timeQuality tzKnown="1" isSynced="1" syncAccuracy="505061"] hello world' | openssl s_client -connect localhost:6514
----
=== Profiling
Profiling `hotdog` is best done on a Linux host with the `perf` tool, e.g.
[source,bash]
----
RUST_LOG=info perf record --call-graph dwarf -- ./target/debug/hotdog -c ./hotdog.yml
perf report -ng --no-inline
----
By default this may run with a single thread, to increase the parallelism of
:hotdog: while profiling, be sure to use the `SMOL_THREADS` environment
variable.
The [hotspot](https://github.com/KDAB/hotspot) profiler visualizer tool works
well with the generated repors.
== Similar Projects
`hotdog` was originally motivated by challenges with
link:https://github.com/rsyslog/rsyslog[rsyslog], a desire for a simple
configuration, and the need for built-in metrics.
Some other similar projects which can be used to get logs into Kafka:
* link:https://github.com/elastic/logstash[logstash]
* link:https://github.com/syslog-ng/syslog-ng[syslog-ng]
* link:https://github.com/timberio/vector[vector]
* link:https://github.com/uswitch/syslogger[syslogger], which doesn't process
messages itself, but rather integrates with `rsyslog`.