https://github.com/sonots/embulk-output-vertica
Embulk output plugin to load/insert data into Vertica
https://github.com/sonots/embulk-output-vertica
embulk jruby vertica
Last synced: 2 months ago
JSON representation
Embulk output plugin to load/insert data into Vertica
- Host: GitHub
- URL: https://github.com/sonots/embulk-output-vertica
- Owner: sonots
- License: mit
- Created: 2015-02-14T14:39:15.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2016-10-31T06:13:52.000Z (over 8 years ago)
- Last Synced: 2025-02-06T19:02:15.981Z (3 months ago)
- Topics: embulk, jruby, vertica
- Language: Ruby
- Homepage:
- Size: 121 KB
- Stars: 4
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Vertica output plugin for Embulk
## Overview
* **Plugin type**: output
* **Resume supported**: no
* **Cleanup supported**: yes
* **Dynamic table creating**: yes## Configuration
- **host**: hostname (string, default: localhost)
- **port**: port number (integer, default: 5433)
- **user**: user name (string, required)
- **password**: password (string, default: '')
- **database**: database name (string, default: vdb)
- **schema**: schema name (string, default: public)
- **table**: table name (string, required)
- **mode**: "insert", or "replace". See bellow. (string, default: insert)
- **copy_mode**: specifies how data is loaded into the database. See vertica documents for details. (`AUTO`, `DIRECT`, or `TRICKLE`. default: `AUTO`)
- **pool**: number of output threads, this number controls number of concurrency to issue COPY statements (integer, default: processor_count, that is, number of threads in input plugin)
- **abort_on_error**: stops the COPY command if a row is rejected and rolls back the command. No data is loaded. (bool, default: false)
- **compress**: compress input (`GZIP`, or `UNCOMPRESSED`, default: `UNCOMPRESSED`)
- **reject_on_materialized_type_error**: uses `reject_on_materialized_type_error` option for fjsonparser(). This rejects rows if any of column types and value types do not fit, ex) double value into INT column fails. See vertica documents for details. (bool, default: false)
- **default_timezone**: the default timezone for column_options (string, default is "UTC")
- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column.
- **type**: type of a column when this plugin creates new tables such as `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`. This is used on creating intermediate tables (insert and truncate_insert modes) and on creating a new target table. (string, default: depends on input column type, see below)
- boolean: `BOOLEAN`
- long: `INT` (same with `BIGINT` in vertica)
- double: `FLOAT` (same with `DOUBLE PRECISION` in vertica)
- string: `VARCHAR`
- timestamp: `TIMESTAMP`
- **value_type**: The types (embulk types) of values to convert (string, default: no conversion. See below for available types)
- boolean: `boolean`, `string`
- long: `boolean`, `long`, `double`, `string`, `timestamp`
- double: `boolean`, `long`, `double`, `string`, `timestamp`
- string: `boolean`, `long`, `double`, `string`, `timestamp`
- timestamp: `boolean`, `long`, `double`, `string`, `timestamp`
- **timestamp_format**: timestamp format to convert into/from `timestamp` (string, default is "%Y-%m-%d %H:%M:%S %z")
- **timezone**: timezone to convert into/from `timestamp` (string, default is `default_timezone`).
- **json_payload**: Assuming first columns of records are json string, COPY each json into vertica directly using fjsonparser. This would improve performance by avoiding construction of json in this jruby plugin. ToDo: auto table creation is not supported for this mode yet (bool, default is false)
- **resource_pool**: Configure resource pool (string, default is nil)### Modes
* **insert**:
* Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, runs `INSERT INTO SELECT * FROM `
* **replace**:
* Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, swaps the target table and the intermediate table, and drops the old target table
* **drop_insert**:
* Behavior: Same with insert mode excepting that it drops the target table first. This is aimed to use for development.## Example
```yaml
out:
type: vertica
host: 127.0.0.1
user: dbadmin
password: xxxxxxx
database: vdb
schema: sandbox
table: embulk_test
copy_mode: DIRECT
abort_on_error: true
column_options:
id: {type: INT}
name: {type: VARCHAR(255)}
date: {type: DATE, value_type: timestamp, timezone: "+09:00"}
```## Development
Run example:
```
$ embulk bundle install --path vendor/bundle
$ embulk -J-O -R--dev run -b . -l debug example/example.yml
```Release gem:
```
$ bundle exec rake release
```## ChangeLog
[CHANGELOG.md](CHANGELOG.md)