Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/civitaspo/fluent-plugin-cat-sweep
https://github.com/civitaspo/fluent-plugin-cat-sweep
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/civitaspo/fluent-plugin-cat-sweep
- Owner: civitaspo
- License: mit
- Created: 2015-06-16T03:07:13.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-01-10T00:59:53.000Z (almost 5 years ago)
- Last Synced: 2024-10-13T21:06:04.440Z (3 months ago)
- Language: Ruby
- Size: 62.5 KB
- Stars: 12
- Watchers: 3
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# fluent-plugin-cat-sweep
[![Build Status](https://secure.travis-ci.org/civitaspo/fluent-plugin-cat-sweep.png?branch=master)](http://travis-ci.org/civitaspo/fluent-plugin-cat-sweep)
Fluentd plugin to read data from files and to remove or move after processing.
## Installation
Add this line to your application's Gemfile:
```ruby
gem 'fluent-plugin-cat-sweep'
```And then execute:
$ bundle
Or install it yourself as:
$ gem install fluent-plugin-cat-sweep
## Basic Behavior
Assume that an application outputs logs into `/tmp/test` directory as
```
tmp/test
├── accesss.log.201509151611
├── accesss.log.201509151612
└── accesss.log.201509151613
```in every one minute interval.
This plugin watches the directory (`file_path_with_glob tmp/test/access.log.*`), and reads the contents and sweep (deafault: remove) for files whose mtime are passed in 60 seconds (can be configured with `waiting_seconds`).
Our assumption is that this mechanism should provide more durability than `in_tail` (batch read overcomes than streaming read).
## Potential problem of in_tail
Assume that an application outputs logs into `/tmp/test/access.log` and rotates it in every one minute interval as
(initial state)
```
tmp/test
└── accesss.log (i-node 4478316)
```(one minute later)
```
tmp/test
├── accesss.log (i-node 4478319)
└── accesss.log.1 (i-node 4478316)
```(two minutes later)
```
tmp/test
├── accesss.log (i-node 4478322)
├── accesss.log.1 (i-node 4478319)
└── accesss.log.2 (i-node 4478316)
```Your configuration of `in_tail` may become as followings:
```apache
@type tail
path tmp/test/access.log
pos_file /var/log/td-agent/access.log.pos
tag access
format none```
Now, imagine that the fluentd process dies (or manually stops for maintenance) just before the 2nd file of i-node 4478319 is generated, and you restart the fluentd process after two minutes passed. Then, you miss the 2nd file of i-node 4478319.
(initial state)
```
tmp/test
└── accesss.log (i-node 4478316) <= catch
```(fluentd dies)
(one minute later)
```
tmp/test
├── accesss.log (i-node 4478319) <= miss
└── accesss.log.1 (i-node 4478316)
```(two minutes later)
(fluentd restarts)
```
tmp/test
├── accesss.log (i-node 4478322) <= catch
├── accesss.log.1 (i-node 4478319) <= miss
└── accesss.log.2 (i-node 4478316)
```## Configuration
```
@type cat_sweep
# Required. process files that match this pattern using glob.
file_path_with_glob /tmp/test/file_*# Parser Plugin Setting
# You can use the old style instead. (Not recommended)
# ===
# format tsv
# keys xpath,access_time,label,payload
# ===
@type tsv
keys xpath,access_time,label,payload
# Required. process files that are older than this parameter(seconds).
# [WARNING!!]: this plugin moves or removes files even if the files are still open.
# make sure to set this parameter for seconds that the application closes files definitely.
waiting_seconds 60# Optional. default is file.cat_sweep
tag test.input# Optional. processing files are renamed with this suffix. default is .processing
processing_file_suffix .processing# Optional. error files are renamed with this suffix. default is .error
error_file_suffix .err# Optional. line terminater. default is "\n"
line_terminated_by ,# Optional. max bytes oneline can have. default 536870912 (512MB)
oneline_max_bytes 128000# Optional. processed files are moved to this directory.
# default '/tmp'
move_to /tmp/test_processed# Optional. if this parameter is specified, `move_to` option is ignored.
# processed files are removed instead of being moved to `move_to` directory.
# default is false.
remove_after_processing true# Optional. default 5 seconds.
run_interval 10# Optional. Emit entire file contents as an event, default emits each line as an event.
# This assures that fluentd emits the entire file contents together. Please note that buffer_chunk_limit
# must be larger than bytes in a file to be sent by buffered output plugins such as out_forward, out_s3.
file_event_stream false# Optional. When doing flock files, open these files with "r+" mode if this option is true, nor with "r" mode.
# default is false.
flock_with_rw_mode false```
## ChangeLog
[CHANGELOG.md](CHANGELOG.md)
## Contributing
1. Fork it ( https://github.com/civitaspo/fluent-plugin-cat-sweep/fork )
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new Pull Request