Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yuokada/embulk-output-orc
https://github.com/yuokada/embulk-output-orc
embulk embulk-output-plugin embulk-plugin java orc
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/yuokada/embulk-output-orc
- Owner: yuokada
- License: mit
- Created: 2017-06-29T16:44:41.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-09-09T13:36:01.000Z (3 months ago)
- Last Synced: 2024-10-19T10:41:34.379Z (2 months ago)
- Topics: embulk, embulk-output-plugin, embulk-plugin, java, orc
- Language: Scala
- Size: 383 KB
- Stars: 4
- Watchers: 2
- Forks: 4
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Orc output plugin for Embulk
[![Build Status](https://github.com/yuokada/embulk-output-orc/workflows/Java%20CI/badge.svg)](https://github.com/yuokada/embulk-output-orc/actions)
[![Gem Version](https://badge.fury.io/rb/embulk-output-orc.svg)](https://badge.fury.io/rb/embulk-output-orc)## Overview
* **Plugin type**: output
* **Load all or nothing**: no
* **Resume supported**: no
* **Cleanup supported**: yes## Configuration
- **path_prefix**: A prefix of output path. (string, required)
- support: `file`, `s3`, `s3n` and `s3a`.
- **file_ext**: An extension of output file. (string, default: `.orc`)
- **sequence_format**: (string, default: `.%03d`)
- **buffer_size**: Set the ORC buffer size (integer, default: `262144(256KB)` )
- **strip_size**: Set the ORC strip size (integer, default: `67108864(64MB)` )
- **block_size**: Set the ORC block size (integer, default: `268435456(256MB)`)
- **compression_kind**: description (string, default: `'ZLIB'`)
- `NONE`, `ZLIB`, `SNAPPY`, `LZO`, `LZ4`
- **overwrite**: Overwrite if output files already exist. (boolean, default: `false`)
- Support: `LocalFileSystem`, `S3(s3, s3a, s3n)`
- **default_from_timezone** Time zone of timestamp columns. This can be overwritten for each column using column_options (DateTimeZone, default: `UTC`)- **auth_method**: name of mechanism to authenticate requests (basic, env, instance, profile, properties, anonymous, or session. default: basic)
see: https://github.com/embulk/embulk-input-s3#configuration- `env`, `basic`, `profile`, `default`, `session`, `anonymous`, `properties`
## Example
```yaml
out:
type: orc
path_prefix: "/tmp/output"
compression_kind: ZLIB
overwrite: true
```## ChangeLog
### ver 0.3.4
- Bump `orc` library to `1.5.4`
- bugfix
- https://github.com/yuokada/embulk-output-orc/pull/17### ver 0.3.3
- bugfix
- Bump `orc` library to `1.4.4`### ver 0.3.2
- Update `orc` libraries to `1.4.3`
### ver 0.3.0
- Change default value : (block_size, buffer_size, strip_size)
- default value is Hive's default value.
(see: https://orc.apache.org/docs/hive-config.html)### ver 0.2.0
- support: output to s3
- `s3n`, `s3a` protocol
### ver 0.1.0
- initial release
## Build
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```## SonarQube
[embulk-output-orc](https://sonarcloud.io/dashboard?id=embulk-output-orc "embulk-output-orc - Yukihiro Okada")