https://github.com/red-data-tools/fluent-plugin-s3-arrow
Extends the fluent-plugin-s3 compression algorithm to enable red-arrow compression.
https://github.com/red-data-tools/fluent-plugin-s3-arrow
arrow aws fluentd fluentd-plugin parquet s3
Last synced: about 1 year ago
JSON representation
Extends the fluent-plugin-s3 compression algorithm to enable red-arrow compression.
- Host: GitHub
- URL: https://github.com/red-data-tools/fluent-plugin-s3-arrow
- Owner: red-data-tools
- License: apache-2.0
- Created: 2020-08-06T06:18:52.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-01-26T21:28:49.000Z (over 4 years ago)
- Last Synced: 2024-04-25T15:01:32.011Z (about 2 years ago)
- Topics: arrow, aws, fluentd, fluentd-plugin, parquet, s3
- Language: Ruby
- Homepage:
- Size: 68.4 KB
- Stars: 2
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fluent-plugin-s3-arrow
[](https://badge.fury.io/rb/fluent-plugin-s3-arrow)
Extends the [fluent-plugin-s3](https://github.com/fluent/fluent-plugin-s3) compression algorithm to enable [red-arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) compression.
## Installation
### Requirements
- Apache Arrow GLib and Apache Parquet GLib
- See Apache [Arrow install document](https://arrow.apache.org/install/) for details.
- [red-arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
- [red-parquet](https://github.com/apache/arrow/tree/master/ruby/red-parquet)
### RubyGems
```
$ gem install fluent-plugin-s3-arrow
```
### Bundler
Add following line to your Gemfile:
```ruby
gem "fluent-plugin-s3-arrow"
```
And then execute:
```
$ bundle
```
## Configuration
Example of fluent-plugin-s3-arrow configuration.
```
@type s3
# fluent-plugin-s3 configurations ...
@type json # This plugin currently supports only json formatter.
store_as arrow
format parquet
compression gzip
schema_from static
schema [
{"name": "test_string", "type": "string"},
{"name": "test_uint64", "type": "uint64"}
]
```
### format and compression
This plugin supports multiple columnar formats and compressions by using red-arrow. Valid settings are below.
| format | compression |
| ---- | ---- |
| arrow | gzip, zstd |
| feather | zstd |
| parquet | gzip, snappy, zstd |
### schema
Schema of columnar formats.
#### schema_from static
Set the schema statically.
```
schema_from static
schema [
{"name": "test_string", "type": "string"},
{"name": "test_uint64", "type": "uint64"}
]
```
##### schema (required)
An array containing the names and types of the fields.
#### schema_from glue
Retrieve the schema from the AWS Glue Data Catalog.
```
schema_from glue
catalog test_catalog
database test_db
table test_table
```
##### catalog
The name of the data catalog for which to retrieve the definition. The default value is the same as the [AWS API CatalogId](https://docs.aws.amazon.com/glue/latest/webapi/API_GetTable.html).
##### database
The name of the database for which to retrieve the definition. The default value is `default`.
##### table (required)
The name of the table for which to retrieve the definition.
## License
Apache License, Version 2.0