https://github.com/ironsource/tailf2kafka
Watch and tail files with specified time based patterns and push them to kafka
https://github.com/ironsource/tailf2kafka
devops ruby
Last synced: about 1 year ago
JSON representation
Watch and tail files with specified time based patterns and push them to kafka
- Host: GitHub
- URL: https://github.com/ironsource/tailf2kafka
- Owner: ironSource
- License: mit
- Created: 2015-09-23T13:23:19.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2023-03-23T19:10:41.000Z (about 3 years ago)
- Last Synced: 2025-02-02T10:17:58.881Z (about 1 year ago)
- Topics: devops, ruby
- Language: Ruby
- Homepage:
- Size: 33.2 KB
- Stars: 0
- Watchers: 12
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tailf2Kafka
Watch and tail files in dirs with specified filename time based patterns and push them to kafka.
## Installation
Install libsnappy dev libs if you want to take advantage of compression
apt-get install libsnappy-dev
Add this line to your application's Gemfile:
gem 'tailf2kafka'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install tailf2kafka
## Usage
$ tailf2kafka -h
Usage: tailf2kafka [options]
--config PATH Path to settings config
-h, --help Display this screen
$
## Config
tailf:
files:
- topic: haproxy
prefix: /var/log/haproxy/haproxy
suffix: ''
time_pattern: ".%Y-%m-%d.%H"
position_file: "/var/lib/haproxy/tail2kafka.offsets"
flush_interval: 1
max_batch_lines: 1024
from_begining: true
delete_old_tailed_files: true
kafka:
brokers: ["broker1:9092", "broker2:9092", "broker3:9092"]
producer_type: sync
produce: true
* kafka.brokers - Array of kafka brokers to connect to
* kafka.producer_type - type of producer sync or async
* kafka.produce - if false will not conect to kafka and will not produce any messages to it
* tailf.position_file - file where to save tailed files offsets which were pushed to kafka
* tailf.flush_interval - how often in seconds to save the offsets to a file
* tailf.max_batch_lines - max number of lines to batch in each send request
* tailf.from_beggining - in case of a new file added to tailing , if to start tailing from beggining or end of the file
* tailf.delete_old_tailed_files - if to delete files once their time_pattern does not match the current time window and if they have been fully produced to kafka
* tailf.files - array of file configs for tail, each tailed file configs consists of:
* topic - which kafka topic to produce the messages to
* prefix - the files prefix to watch for
* time_pattern - ruby time pattern of files to tail
* suffix - optional suffix of files to watch for
so the tool will watch for files that match - prefix + time_pattern + suffix
## Features/Facts
* The config is validated by [schash](https://github.com/ryotarai/schash) gem
* Tailed files are watched for changes by [rb-notify](https://github.com/nex3/rb-inotify) gem
* Dirnames of all files prefixes are watched for new files creation or files moved to the dir and are automaticaly
added to tailing.
* As well dirnames are watched for deletion or files being moved out of directory, and they are removed from the list of files watched for changing.
* Based time_pattern, files are periodicaly autodeleted , thus avoiding need for log rotation tools.
* Files are matched by converting time_pattern to a regexp
## Contributing
1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request
6. Go to 1