https://github.com/jameslikeslinux/duplog
Syslog Deduplicator
https://github.com/jameslikeslinux/duplog
Last synced: 3 months ago
JSON representation
Syslog Deduplicator
- Host: GitHub
- URL: https://github.com/jameslikeslinux/duplog
- Owner: jameslikeslinux
- Created: 2013-05-16T21:58:25.000Z (about 12 years ago)
- Default Branch: master
- Last Pushed: 2013-05-20T20:28:42.000Z (almost 12 years ago)
- Last Synced: 2024-07-11T00:54:42.188Z (10 months ago)
- Language: Java
- Size: 148 KB
- Stars: 16
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Duplog #
Duplog will deduplicate messages from multiple similar streams. This can be used to take syslog messages from multiple redundant rsyslog servers, strip out duplicates between the streams, and produce a complete record of log messages for an application like Splunk.## Building ##
You need Apache Ant and a Java JDK installed. Then run:% ant
to fetch all of the project's dependencies and compile the source. An executable jar file can be built with:
% ant jar
## Running ##
* On each rsyslog server, create a file such as:
% cat /usr/local/libexec/rsyslog/send-to-rabbitmq
#!/bin/sh
exec /usr/bin/java -jar /path/to/duplog.jar injectThen define a configuration in rsyslog such as:
$ModLoad omprog
$ActionOMProgBinary /usr/local/libexec/rsyslog/send-to-rabbitmq
*.* :omprog:Finally, make sure [RabbitMQ](http://www.rabbitmq.com/) is running locally on the default port.
* On the destination server, where deduplicated log messages are required, simply run:
% java -jar /path/to/duplog.jar extract [-o OUTPUT_FILE] [-r REDIS_SERVER] syslog_server [syslog_server ...]
where `syslog_server` is the hostname of a syslog server running RabbitMQ as above. A [Redis](http://redis.io/) server must be available to perform deduplication. It should be running on the default port with the following parameters set in `/etc/redis/redis.conf`:
maxmemory # each unique message will consume about 100 bytes; configure based on messaging rate and available memory
maxmemory-policy allkeys-lru## Benchmarking ##
To get a rough idea of how Duplog performs, you can pipe generated messages through the system.
* On one or more syslog servers (as defined above), run:
% java -cp /path/to/duplog.jar edu.umd.it.duplog.benchmark.Producer | java -jar /path/to/duplog.jar inject
where `token` is a short string that is the same on each message producer, but different for each run. You should see an updating message like:
Messages produced: A last second / B per second average
* On one or more deduplicating servers (as defined above), run:
% java -jar /path/to/duplog.jar extract [-r REDIS_SERVER] syslog_server [syslog_server ...] -o - | java -cp /path/to/duplog.jar edu.umd.it.duplog.benchmark.Consumer
You should see an updating message like:
Messages consumed: X last second / Y per second average