Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/civitaspo/embulk-filter-to_csv
https://github.com/civitaspo/embulk-filter-to_csv
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/civitaspo/embulk-filter-to_csv
- Owner: civitaspo
- License: mit
- Created: 2016-03-03T14:45:52.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2020-12-13T03:59:56.000Z (about 4 years ago)
- Last Synced: 2024-05-01T15:28:41.084Z (8 months ago)
- Language: Java
- Size: 71.3 KB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# To Csv filter plugin for Embulk
Convert a record to CSV. This plugin is the filter-plugin just ported from [embulk-formatter-csv](https://github.com/embulk/embulk/blob/c37ddc30520c10760f93f1e489890ab639859dd4/embulk-standards/src/main/java/org/embulk/standards/CsvFormatterPlugin.java).
## Overview
* **Plugin type**: filter
## Configuration
cf. http://www.embulk.org/docs/built-in.html#csv-formatter-plugin|name|type|description|required?|
|:---|:---|:---|:---|
|column_name|string|Column name used when converting to single value| `"payload"` by default|
|delimiter|string|Delimiter character such as , for CSV, `"\t"` for TSV, `"|"` or any single-byte character| `,` by default|
|quote|string|The character surrounding a quoted value| `"` by default|
|quote_policy|enum|Policy for quote ( `ALL`, `MINIMAL`, `NONE`) (see below)| `MINIMAL` by default|
|escape|string|Escape character to escape quote character|same with quote default (\*1)|
|header_line|boolean|If true, write the header line with column name at the first line| `false` by default|
|null_string|string|Expression of `NULL` values|empty by default|
|newline|enum|Newline character ( `CRLF`, `LF` or `CR`)| `CRLF` by default|
|newline_in_field|enum|Newline character in each field ( `CRLF`, `LF`, `CR`)| `LF` by default|
|charset|enum|Character encoding (eg. `ISO-8859-1`, `UTF-8`)| `UTF-8` by default|
|default_timezone|string|Time zone of timestamp columns. This can be overwritten for each column using `column_options`| `UTC` by default|
|column_options|hash|See bellow|optional|(\*1): if quote_policy is `NONE`, quote option is ignored, and default escape is `\`.
The quote_policy option is used to determine field type to quote.
|name|description|
|:---|:---|
| `ALL`|Quote all fields|
| `MINIMAL`|Only quote those fields which contain delimiter, quote or any of the characters in lineterminator|
| `NONE`|Never quote fields. When the delimiter occurs in field, escape with escape char|The column_options option is a map whose keys are name of columns, and values are configuration with following parameters:
|name|type|description|required?|
|:---|:---|:---|:---|
|timezone|string|Time zone if type of this column is timestamp. If not set, default_timezone is used.|optional|
|format|string|Timestamp format if type of this column is timestamp.| `%Y-%m-%d %H:%M:%S.%6N %z` by default|## Why does this need?
Some output plugin cannot use a formatter plugin, because they are not inherited `FileOutputPlugin`, but sometimes they need formatters.
In that case, this plugin is useful. For example, [embulk-output-bigquery](https://github.com/sonots/embulk-output-bigquery/blob/ruby/README.md#formatter-performance-issue)## Example
```yaml
filters:
- type: to_csv
column_name: payload
delimiter: "\t"
newline: CRLF
newline_in_field: LF
charset: UTF-8
quote_policy: MINIMAL
quote: '"'
escape: "\\"
null_string: "\\N"
default_timezone: 'UTC'
column_options:
time: {format: '%Y-%m-%d %H:%M:%S', timezone: 'America/Los_Angeles'}```
## Run the example
```
$ ./gradlew classpath
$ embulk run example/config.yml -Ilib
```## Build
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```