Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/civitaspo/embulk-filter-distinct
https://github.com/civitaspo/embulk-filter-distinct
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/civitaspo/embulk-filter-distinct
- Owner: civitaspo
- License: mit
- Created: 2015-12-05T15:28:04.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2020-12-13T03:57:28.000Z (about 4 years ago)
- Last Synced: 2024-09-07T04:41:14.859Z (4 months ago)
- Language: Java
- Homepage:
- Size: 123 KB
- Stars: 7
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Distinct filter plugin for Embulk
filter returns distinct records by columns you configured.
## Overview
* **Plugin type**: filter
## Configuration
- **columns**: column name list to distinguish records (array of string, required)
## Example
```yaml
filters:
- type: distinct
columns: [c0, c1]
```## Run Example
```
$ ./gradlew classpath
$ embulk run -I lib example/config.yml
```## Note
this plugin uses a lot of memory because of having distinct column values.
## TODO
- lessen further the amount of memory by filter. i.e. use crc32 of values as distinct key?
- want ideas!
- test## Build
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```