An open API service indexing awesome lists of open source software.

https://github.com/sonots/embulk-output-vertica

Embulk output plugin to load/insert data into Vertica
https://github.com/sonots/embulk-output-vertica

embulk jruby vertica

Last synced: 2 months ago
JSON representation

Embulk output plugin to load/insert data into Vertica

Awesome Lists containing this project

README

        

# Vertica output plugin for Embulk

## Overview

* **Plugin type**: output
* **Resume supported**: no
* **Cleanup supported**: yes
* **Dynamic table creating**: yes

## Configuration

- **host**: hostname (string, default: localhost)
- **port**: port number (integer, default: 5433)
- **user**: user name (string, required)
- **password**: password (string, default: '')
- **database**: database name (string, default: vdb)
- **schema**: schema name (string, default: public)
- **table**: table name (string, required)
- **mode**: "insert", or "replace". See bellow. (string, default: insert)
- **copy_mode**: specifies how data is loaded into the database. See vertica documents for details. (`AUTO`, `DIRECT`, or `TRICKLE`. default: `AUTO`)
- **pool**: number of output threads, this number controls number of concurrency to issue COPY statements (integer, default: processor_count, that is, number of threads in input plugin)
- **abort_on_error**: stops the COPY command if a row is rejected and rolls back the command. No data is loaded. (bool, default: false)
- **compress**: compress input (`GZIP`, or `UNCOMPRESSED`, default: `UNCOMPRESSED`)
- **reject_on_materialized_type_error**: uses `reject_on_materialized_type_error` option for fjsonparser(). This rejects rows if any of column types and value types do not fit, ex) double value into INT column fails. See vertica documents for details. (bool, default: false)
- **default_timezone**: the default timezone for column_options (string, default is "UTC")
- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column.
- **type**: type of a column when this plugin creates new tables such as `VARCHAR(255)`, `INTEGER NOT NULL UNIQUE`. This is used on creating intermediate tables (insert and truncate_insert modes) and on creating a new target table. (string, default: depends on input column type, see below)
- boolean: `BOOLEAN`
- long: `INT` (same with `BIGINT` in vertica)
- double: `FLOAT` (same with `DOUBLE PRECISION` in vertica)
- string: `VARCHAR`
- timestamp: `TIMESTAMP`
- **value_type**: The types (embulk types) of values to convert (string, default: no conversion. See below for available types)
- boolean: `boolean`, `string`
- long: `boolean`, `long`, `double`, `string`, `timestamp`
- double: `boolean`, `long`, `double`, `string`, `timestamp`
- string: `boolean`, `long`, `double`, `string`, `timestamp`
- timestamp: `boolean`, `long`, `double`, `string`, `timestamp`
- **timestamp_format**: timestamp format to convert into/from `timestamp` (string, default is "%Y-%m-%d %H:%M:%S %z")
- **timezone**: timezone to convert into/from `timestamp` (string, default is `default_timezone`).
- **json_payload**: Assuming first columns of records are json string, COPY each json into vertica directly using fjsonparser. This would improve performance by avoiding construction of json in this jruby plugin. ToDo: auto table creation is not supported for this mode yet (bool, default is false)
- **resource_pool**: Configure resource pool (string, default is nil)

### Modes

* **insert**:
* Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, runs `INSERT INTO SELECT * FROM `
* **replace**:
* Behavior: This mode copies rows to an intermediate table first. If all those tasks run correctly, swaps the target table and the intermediate table, and drops the old target table
* **drop_insert**:
* Behavior: Same with insert mode excepting that it drops the target table first. This is aimed to use for development.

## Example

```yaml
out:
type: vertica
host: 127.0.0.1
user: dbadmin
password: xxxxxxx
database: vdb
schema: sandbox
table: embulk_test
copy_mode: DIRECT
abort_on_error: true
column_options:
id: {type: INT}
name: {type: VARCHAR(255)}
date: {type: DATE, value_type: timestamp, timezone: "+09:00"}
```

## Development

Run example:

```
$ embulk bundle install --path vendor/bundle
$ embulk -J-O -R--dev run -b . -l debug example/example.yml
```

Release gem:

```
$ bundle exec rake release
```

## ChangeLog

[CHANGELOG.md](CHANGELOG.md)