Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tenmax/cqlkit

CLI tool to export Cassandra query as CSV and JSON format.
https://github.com/tenmax/cqlkit

cassandra cli

Last synced: 6 days ago
JSON representation

CLI tool to export Cassandra query as CSV and JSON format.

Awesome Lists containing this project

README

        

# CQLKIT
*cqlkit* is a CLI tool to export Cassandra query to CSV and JSON format. Cassandra is not good at Ad-hoc query, *cqlkit* allows you to export query result to semi-structured(JSON) or structured data(CSV). There are many [tools](#recommended-3rd-party-tools) out there for you to query or process these kinds of format.

Here is a simple some examples.

Export JSON for the system columns in cassandra cluster.

```bash
cql2json -q "select peer, data_center, host_id, preferred_ip, rack, release_version from system.peers"
```

Export CSV for the system columns in cassandra cluster.

```bash
cql2csv -q "select peer, data_center, host_id, preferred_ip, rack, release_version from system.peers"
```

# Requirement

- Java8

# Installation

## General

1. Download from [release](https://github.com/tenmax/cqlkit/releases) page.
2. Unzip the package.
3. Add `$CQLKIT_PATH/bin` to the *PATH* environment variable

## Mac

Install cqlkit via [Homebrew](http://brew.sh/).

```bash
brew update
brew install cqlkit
```

Upgrade cqlkit

```bash
brew update
brew upgrade cqlkit
```

## Docker

Run cqlkit via [Docker](https://hub.docker.com/r/tenmax/cqlkit).

```bash
docker run --rm -it tenmax/cqlkit
```

# Usage
## CQL2CSV

```
usage: cql2csv [-c contactpoint] [-r cassandraPort] [-q query] [FILE]
File The file to use as CQL query. If both FILE and QUERY are
omitted, query will be read from STDIN.

-c The contact point. if use multi
contact points, use ',' to separate
multi points
--connect-timeout Connection timeout in seconds;
default: 5
--consistency The consistency level. The level
should be 'any', 'one', 'two',
'three', 'quorum', 'all',
'local_quorum', 'each_quorum',
'serial' or 'local_serial'.
--cqlshrc Use an alternative cqlshrc file
location, path.
--date-format Use a custom date format. Default is
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
--fetchSize The fetch size. Default is 5000
-h,--help Show the help and exit
-H,--no-header-row Do not output column names.
-k The keyspace to use.
-l,--linenumbers Insert a column of line numbers at
the front of the output. Useful when
piping to grep or as a simple primary
key.
-p The password to authenticate.
-r The port to connect to Cassandra, defaults to 9042.
-P,--parallel The level of parallelism to run the
task. Default is sequential.
-q,--query The CQL query to execute. If
specified, it overrides FILE and
STDIN.
--query-partition-keys Query the partition key(s) for a
column family.
--query-ranges The CQL query would be splitted by
the token ranges. WHERE clause is not
allowed in the CQL query
--request-timeout Request timeout in seconds; default:
12
-u The user to authenticate.
-v,--version Print the version
```

## CQL2JSON
```
usage: cql2json [-c contactpoint] [-r cassandraPort] [-q query] [FILE]
File The file to use as CQL query. If both FILE and QUERY are
omitted, query will be read from STDIN.

-c The contact point. if use multi
contact points, use ',' to separate
multi points
--connect-timeout Connection timeout in seconds;
default: 5
--consistency The consistency level. The level
should be 'any', 'one', 'two',
'three', 'quorum', 'all',
'local_quorum', 'each_quorum',
'serial' or 'local_serial'.
--cqlshrc Use an alternative cqlshrc file
location, path.
--date-format Use a custom date format. Default is
"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
--fetchSize The fetch size. Default is 5000
-h,--help Show the help and exit
-j,--json-columns The columns that contains json
string. The content would be used as
json object instead of plain text.
Columns are separated by comma.
-k The keyspace to use.
-l,--linenumbers Insert a column of line numbers at
the front of the output. Useful when
piping to grep or as a simple primary
key.
-p The password to authenticate.
-r The port to connect to Cassandra, defaults to 9042.
-P,--parallel The level of parallelism to run the
task. Default is sequential.
-q,--query The CQL query to execute. If
specified, it overrides FILE and
STDIN.
--query-partition-keys Query the partition key(s) for a
column family.
--query-ranges The CQL query would be splitted by
the token ranges. WHERE clause is not
allowed in the CQL query
--request-timeout Request timeout in seconds; default:
12
-u The user to authenticate.
-v,--version Print the version
```

# cqlsh
## Setup the cqlshrc
To connect to cassandra cluster, although we can use `-c` and `-k` to specify the contact server and keyspace respectively, to preapre a [cqlshrc](http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlsh.html#refCqlsh__cqlshUsingCqlshrc) is recommended to simply your query. *cqlshrc* is used by cqlsh. *cqlkit* leverages this file to connect to your cluster. Here is the setup steps.

1. Create the cqlshrc file at `~/.cassandra/cqlshrc`
2. Here is the example format.

```bash
[authentication]
keyspace = system

[connection]
hostname = 192.168.59.103
port = 9042

; vim: set ft=dosini :
```

## Import data from a CSV file

```
$ cql2csv -q "select text_col from ks.tbl" > example.csv

$ ./cqlsh localhost
cqlsh> COPY ks.tbl FROM 'example.csv' WITH ESCAPE='"' AND HEADER=TRUE
```

# Recommended 3rd Party Tools

- [csvkit](https://csvkit.readthedocs.org/en/0.9.1/) - A toolkit to handle CSV files. There are many useful CLI tools included.

- [q](https://github.com/harelba/q) - Another CSV tool which focuses on query on CSV files.

- [json2csv](https://github.com/jehiah/json2csv) - Convert JSON format to CSV format

- [jq](http://stedolan.github.io/jq/) - a lightweight and flexible command-line JSON processor.