Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/stephane-martin/reaper

Receive access logs from web server and push to a message queue
https://github.com/stephane-martin/reaper

access-logs apache-logging nginx-logs nsqd

Last synced: 10 days ago
JSON representation

Receive access logs from web server and push to a message queue

Awesome Lists containing this project

README

        

======
reaper
======

.. contents::
:depth: 3
..

.. section-numbering::

``reaper`` is a simple tool to collect access logs from web servers and
publish the logs to an external message queue.

::

,,,,, ,,,,,
,,,,,,,,, ,,,,,,,,,,
,,,,,,,,,,,, ,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,
## ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
#### ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
##### ,,,@@@@@*,,,,,,,,,,,,,,,@@@@@,,,
###### ,,,,,#@@@@@&,,,,,,,,,,/@@@@@@,,,,
####### ,,,,,,,@@@@@@,,,,,,,,@@@@@@,,,,,,,
######### ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
########## ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
########### ,,,,,,,,,,,,,,,,(/,,,,,,,,,,,,,,,,
########### ,,,,,,@,/@@,,@,,@@,,@,,@@*,@,,,,,,
############ ,,,,,,@@@@@@&@@@@@@@@@@@@@,,,,,,
############# ,,,,,,,@@@@@@@@@@@@@@@@,,,,,,,
############## ,,,,@,@@@@@@@@@@@@,@,,,,
############## ,,,,@@,@@@@,@@,,*,
############## .,,,*,,@@,,/,,,
############### ,,,,,%#,,,,,
############### ,,,,,,,,,,
############### .,,,,,,,,
################
################
################
################ GIVE ME YOUR LOGS
################
##################
##################

Features
========

- Collect log on TCP/UDP syslog
- Syslog RFC3154 or RFC5424
- Collect log on stdin
- Parse logs formats: JSON, key/values, common, combined
- Stream access logs with websocket
- Download logs with HTTP
- Filter out unwanted log lines (predicate in Javascript)
- Can write collected logs to stdout, stderr, file
- Can write collected logs to databases: PostgreSQL/TimescaleDB,
Elasticsearch
- Can write collected logs to message brokers: RabbitMQ, nsqd, STOMP
enabled message broker
- Can write collected logs to a distributed log: Kafka
- Can write collected logs to a redis list
- Can forward collected logs to another reaper instance
- Should work on any \*NIX

Project status
==============

Alpha. Version 0.1.0.

reaper is functional and be used in simple environments. But it lacks
proper test cases and performance testing in busy environments.

Getting Started
===============

Install
-------

- Binary releases

https://github.com/stephane-martin/reaper/releases

Just copy the binary in your PATH.

- Compile from source

``git clone https://github.com/stephane-martin/reaper`` in an
appropriate folder (GOPATH…)

``make debug`` or ``make release``

Configure
---------

Currently reaper does not use a configuration file. Arguments are passed
on the command line or with environment variables.

Inline help
-----------

``reaper --help``

``reaper (command) --help``

Use reaper
----------

Listen for access log entries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TCP syslog
^^^^^^^^^^

Start reaper with ``--tcp 127.0.0.1:1514``. Here 127.0.0.1 is the listen
address.

UDP syslog
^^^^^^^^^^

Start reaper with ``--udp 127.0.0.1:1514``.

This can be used with nginx or caddy. In nginx.conf:

::

access_log syslog:server=127.0.0.1:1514,facility=daemon,tag=nginxaccess,severity=info jrich;

Syslog protocol
^^^^^^^^^^^^^^^

By default the syslog protocol is supposed to be RFC3164. Use the global
flag ‘–rfc5424’ to switch to RFC5424.

stdin
^^^^^

Start reaper with ``--stdin``.

This can be used with Apache. For example in Apache configuration:

::

CustomLog "||/path/to/reaper --format combined --stdin" combined

Configure access logs format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

reaper needs to know the format in which the web server writes access
logs entries. Use the ``--format`` flag.

JSON
^^^^

``reaper --udp 127.0.0.1:1514 --format json``

Example nginx configuration:

::

log_format jrich escape=json
'{'
'"timestamp":"$time_iso8601",'
'"method":"$request_method",'
'"scheme":"$scheme",'
'"host":"$host",'
'"server":"$server_name",'
'"uri":"$uri",'
'"duration":$request_time,'
'"length":$request_length,'
'"status":$status,'
'"sent":$bytes_sent,'
'"agent":"$http_user_agent",'
'"remoteaddr":"$remote_addr",'
'"remoteuser":"$remote_user"'
'}';

access_log syslog:server=127.0.0.1:1514,facility=daemon,tag=nginxaccess,severity=info jrich;

Key/values
^^^^^^^^^^

``reaper --udp 127.0.0.1:1514 --format kv``

Example nginx configuration:

::

log_format rich
'remote_addr="$remote_addr" remote_user="$remote_user" time="$time_iso8601" length=$request_length'
' host="$host" request="$request_uri" uri="$uri" status=$status bytes_sent=$bytes_sent agent="$http_user_agent"'
' duration=$request_time upstream_duration=$upstream_response_time method="$request_method" scheme="$scheme"'
' server="$server_name"';

common log format
^^^^^^^^^^^^^^^^^

``reaper --udp 127.0.0.1:1514 --format common``

combined log format
^^^^^^^^^^^^^^^^^^^

``reaper --udp 127.0.0.1:1514 --format combined``

Filter access logs
~~~~~~~~~~~~~~~~~~

The ``--filterout EXPR`` global flag can be set to specify a filter.

EXPR is a javascript expression that can use the log entry fields. If
the EXPR is True, the entry is filtered out. Multiple –filterout flags
can be used. In that case, an entry is filtered out if any of the
expressions is True.

Example:

``reaper --udp 127.0.0.1:1514 --format json --filterout 'host=="example.org"' stdout``

Log entries for requests to http://example.org will be filtered out.

Please note that filtering is not free from a performance point of view.
It uses an embedded Javascript engine.

Forward access logs
~~~~~~~~~~~~~~~~~~~

reaper can forward access logs to various destinations. The type of the
destination is selected through a command on reaper command line, after
the previous global flags.

When the destination is not reachable, log entries are buffered in the
embedded nsqd instance. When the destination is reachable again,
buffered entries will be forwarded. So you do not need to start the
destination before reaper.

Each destination has specific flags to configure it.

stdout, stderr
^^^^^^^^^^^^^^

- ``reaper --udp 127.0.0.1 stdout``
- ``reaper --udp 127.0.0.1 stderr``

file
^^^^

- ``reaper --udp 127.0.0.1 file --filename /tmp/access.log`` => write
log entries to /tmp/access.log
- ``reaper --udp 127.0.0.1 file --gzip --filename /tmp/access.log.gz``
=> write compressed log entries to /tmp/access.log.gz

RabbitMQ
^^^^^^^^

Forward logs to a RabbitMQ exchange.

``reaper --udp 127.0.0.1 rabbitmq --uri "amqp://guest:guest@localhost:5672/" --exchange exname --routing-key key --type direct``

This will forward entries to a RabbitMQ broker, located at
localhost:5672, using guest/guest as credentials, to the / virtual host,
in the direct exchange exname, and with “key” as a routing key.

STOMP
^^^^^

``./reaper_debug --udp 127.0.0.1:1514 stomp --login user --passcode password --host virtualhost --destination /queue/reaper --addr 192.168.1.2:61613``

Elasticsearch
^^^^^^^^^^^^^

Forward logs to an Elasticsearch server.

``reaper --udp 127.0.0.1 elasticsearch --url http://127.0.0.1:9200 --index indexname``

Redis
^^^^^

Forward logs to Redis, using a redis list (think LPOP, RPUSH).

``reaper --udp 127.0.0.1 redis --addr 127.0.0.1:6379 --listname thelistkey --database 6 --password pass``

Kafka
^^^^^

``reaper --udp 127.0.0.1 kafka --broker 192.168.1.2:9092 --broker 192.168.1.3:9092 --broker 192.168.1.4:9092 --topic topicname``

PostgreSQL/TimescaleDB
^^^^^^^^^^^^^^^^^^^^^^

First you need to create a table in PostgreSQL that is consistent with
the log format.

For example:

::

+------------+--------------------------+-------------------+
| Column | Type | Modifiers |
|------------+--------------------------+-------------------+
| timestamp | timestamp with time zone | not null |
| method | text | default ''::text |
| scheme | text | default ''::text |
| host | text | default ''::text |
| server | text | default ''::text |
| uri | text | default ''::text |
| duration | double precision | default 0 |
| length | integer | default 0 |
| status | integer | default 0 |
| sent | integer | default 0 |
| agent | text | default ''::text |
| remoteaddr | text | default ''::text |
| remoteuser | text | default ''::text |
+------------+--------------------------+-------------------+

Indexes:
"reaper_duration_timestamp_idx" btree (duration, "timestamp" DESC)
"reaper_host_timestamp_idx" btree (host, "timestamp" DESC)
"reaper_length_timestamp_idx" btree (length, "timestamp" DESC)
"reaper_method_timestamp_idx" btree (method, "timestamp" DESC)
"reaper_remoteaddr_timestamp_idx" btree (remoteaddr, "timestamp" DESC)
"reaper_scheme_timestamp_idx" btree (scheme, "timestamp" DESC)
"reaper_sent_timestamp_idx" btree (sent, "timestamp" DESC)
"reaper_server_timestamp_idx" btree (server, "timestamp" DESC)
"reaper_timestamp_idx" btree ("timestamp" DESC)

Then:

::

reaper --udp 127.0.0.1:1514 pgsql \
--uri "postgres://user:[email protected]/dbname"
--table tablename
--fields "timestamp,method,scheme,host,server,uri,duration,length,status,sent,agent,remoteaddr,remoteuser"

External nsqd
^^^^^^^^^^^^^

``reaper --udp 127.0.0.1:1514 nsq --addr 192.168.1.2:4150 --topic topicname --json``

Forward to another reaper instance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

On machine A 192.168.1.2 (with web server):

``reaper --udp 127.0.0.1:1514 nsq --addr 192.168.1.3:4150 --topic embedded``

On machine B 192.168.1.3:

``reaper --nsqd-address 192.168.1.3 --nsqd-tcp-port 4150 pgsql ...``

HTTP API
~~~~~~~~

If started with ``--http-address``, reaper exposes a HTTP API.

Endpoints:

- /status => just returns 200 HTTP status code.

- /metrics => prometheus metrics (with the embedded nsqd metrics).

- POST /download/:clientid?wait=3000&size=1000 => creates a channel of
access logs entries and download entries.

size is the number of entries to be returned. wait is the number of
milliseconds to wait

After the first POST call, a nsq channel is created. All received
entries will be copied to this channel. Each successive POST call
with return different entries.

- DELETE /download/:clientid => delete a previously created channel

Websocket API
~~~~~~~~~~~~~

If started with ``--websocket-address``, reaper exposes a websocket
endpoint.

- /stream: stream received entries to the websocket client.

Logging
~~~~~~~

By default reaper own logs are written on stderr.

The logging level can be set with ``--loglevel`` [debug, info, warn,
error, crit].

Alternatively reaper can use syslog with ``--syslog``

Design
======

reaper embeds a nsqd service (https://nsq.io). When access logs entries
are received on TCP, UDP or stdin, they are first stored in the embedded
nsqd. Thus, reaper only deletes an access log entry when it has been
reliably sent to the configured destination.

Forwarding to the destination is done asynchronously to achieve good
performance.

Changelog
=========

https://github.com/stephane-martin/reaper/blob/master/CHANGELOG.md