Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/joker1007/embulk-output-cassandra

Apache Cassandra output plugin for Embulk.
https://github.com/joker1007/embulk-output-cassandra

Last synced: 2 days ago
JSON representation

Apache Cassandra output plugin for Embulk.

Awesome Lists containing this project

README

        

# Cassandra output plugin for Embulk

![Java CI](https://github.com/joker1007/embulk-output-cassandra/workflows/Java%20CI/badge.svg)

Apache Cassandra output plugin for Embulk.

## Compatibility

| embulk-output-kafka | embulk | datastax-driver-core |
| --------------------- | --------------------------------------- | --------------------- |
| 0.6.x | 0.11.x or later | 4.x |
| 0.5.x | 0.9.x or later (may not work on 0.11.x) | 3.11.x |

## Breaking Changes

### 0.6.0
- `timestamp` column accepts string as Java's ISO_INSTANT format.
- `timestamp` column accepts long and double as epoch millis. (before: as epoch seconds)
- `date` column accepts long as days from epoch. (before: not supported)

## Overview

* **Plugin type**: output
* **Load all or nothing**: no
* **Resume supported**: yes
* **Cleanup supported**: no

## Caution
In current, version of netty components conflicts to one that is used by embulk-core.

This probrem is very severe.

I tested this plugin on embulk-0.9.7.
But future embulk version may break this plugin.

## Support Data types

| CQL Type | Embulk Type | Descritpion |
| -------- | ----------- | -------------- |
| ascii | string, boolean, long, double, timestamp, json | use `toString` or `toJson` |
| bigint | string, boolean(as 0 or 1), long, double | |
| blob | unsupported | |
| boolean | boolean, long, double | 0 == false, 1 == true |
| counter | unsupported | |
| date | string, long, timestamp | long as days from epoch, timestamp as UTC timestamp |
| decimal | string, boolean(as 0 or 1), long, double | |
| double | string, boolean(as 0 or 1), long, double | |
| float | string, boolean(as 0 or 1), long, double | |
| inet | string | |
| int | string, boolean(as 0 or 1), long, double | overflowed value is reset to 0 |
| list | json | |
| map (support only text key) | json | |
| set | json | |
| smallint | string, boolean(as 0 or 1), long, double | overflowed value is reset to 0 |
| text | string, boolean, long, double, timestamp, json | use `toString` or `toJson` |
| time | string, long, double, timestamp | long and double as nano seconds of day,
timestamp as UTC timestamp |
| timestamp | string, long, double, timestamp | string as Java's ISO_INSTANT format, long and double as epoch millis |
| timeuuid | null | |
| uuid | null | |
| varchar | string, boolean, long, double, timestamp, json | use `toString` or `toJson` |
| varint | string, boolean(as 0 or 1), long, double | |
| UDT | unsupported | |

## Insert Behavior
If embulk record does not have a column, it is treated as `unset`.
If same key record already exists, the column is not touched.

### Counter table
This plugin supports counter table.

But counter table supports only increment/decrement update.

Because of it, This plugin uses input value as increment value;

For example, If input data = {id: 1, count: 5}, Executed Statement is `UPDATE tablename SET count = count + 5 WHERE id = 1`

## Configuration

- **hosts**: list of seed hosts (list, required)
- **port**: port number for cassandra cluster (integer, default: `9042`)
- **username**: cluster username (string, default: `null`)
- **password**: cluster password (string, default: `null`)
- **cluster_name**: cluster name (string, default: `null`)
- **keyspace**: target keyspace name (string, required)
- **table**: target table name (string, required)
- **mode**: insert or update or delete (string, default: `"insert"`)
- **if_not_exists**: Add "IF NOT EXISTS" to INSERT query (boolean, default: `false`)
- **if_exists**: Add "IF EXISTS" to UPDATE query (boolean, default: `false`)
- **ttl**: Add "TTL" to INSERT query (integer, default: `null`)
- **idempotent**: Treat INSERT query as idempotent (boolean, default: `false`)
- **connect_timeout**: Set connect timeout millisecond (integer, default: `5000`)
- **request_timeout**: Set each request timeout millisecond (integer, default: `12000`)

## Example

```yaml
out:
type: cassandra
hosts:
- 127.0.0.1
port: 9042
keyspace: sample_keyspace
table: sample_table
idempotent: true
```

## Build

```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```