Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/civitaspo/embulk-filter-join_file
Now only support json format...
https://github.com/civitaspo/embulk-filter-join_file
Last synced: 3 months ago
JSON representation
Now only support json format...
- Host: GitHub
- URL: https://github.com/civitaspo/embulk-filter-join_file
- Owner: civitaspo
- License: mit
- Created: 2015-10-11T02:55:27.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2021-04-24T22:26:22.000Z (over 3 years ago)
- Last Synced: 2024-05-01T15:28:41.157Z (8 months ago)
- Language: Java
- Homepage:
- Size: 215 KB
- Stars: 5
- Watchers: 3
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Join File filter plugin for Embulk
This plugin combine rows from file having data format like a table, based on a common field between them.
## Overview
* **Plugin type**: filter
## Configuration
- **base_column**: a column name of data embulk loaded (hash, required)
- **name**: name of the column
- **type**: type of the column (see below)
- **format**: format of the timestamp if type is timestamp
- **counter_column**: a column name of data loaded from file (string, default: `{name: id, type: long}`)
- **name**: name of the column
- **type**: type of the column (see below)
- **format**: format of the timestamp if type is timestamp
- **joined_column_prefix**: prefix added to joined data columns (string, default: `"_joined_by_embulk_"`)
- **file_path**: path of file (string, required)
- **file_format**: file format (string, required, supported: `csv`, `tsv`, `yaml`, `json`)
- **columns**: required columns of data from the file (array of hash, required)
- **name**: name of the column
- **type**: type of the column (see below)
- **format**: format of the timestamp if type is timestamp---
**type of the column**|name|description|
|:---|:---|
|boolean|true or false|
|long|64-bit signed integers|
|timestamp|Date and time with nano-seconds precision|
|double|64-bit floating point numbers|
|string|Strings|## Example
```yaml
filters:
- type: join_file
base_column: {name: name_id, type: long}
counter_column: {name: id, type: long}
joined_column_prefix: _joined_by_embulk_
file_path: master.json
file_format: json
columns:
- {name: id, type: long}
- {name: name, type: string}
```## Run Example
```
$ ./gradlew classpath
$ embulk run -I lib example/config.yml
```## Supported Data Format
- csv ( **not implemented** )
- tsv ( **not implemented** )
- yaml ( **not implemented** )
- json### Supported Data Format Example
#### CSV
```csv
id,name
0,civitaspo
2,mori.ogai
5,natsume.soseki
```#### TSV
Since the representation is difficult, it represents the tab as `\t`.
```tsv
id\tname
0\tcivitaspo
2\tmori.ogai
5\tnatsume.soseki
```#### YAML
```
- id: 0
name: civitaspo
- id: 2
name: mori.ogai
- id: 5
name: natsume.soseki
```#### JSON
```
[
{
"id": 0,
"name": "civitaspo"
},
{
"id": 2,
"name": "moriogai"
},
{
"id": 5,
"name": "natsume.soseki"
}
]
```## Build
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```