https://github.com/hawkular/cassalog

A Cassandra schema change management tool for applications running on the JVM
https://github.com/hawkular/cassalog
Last synced: about 1 year ago
JSON representation
A Cassandra schema change management tool for applications running on the JVM
Host: GitHub
URL: https://github.com/hawkular/cassalog
Owner: hawkular
License: apache-2.0
Created: 2016-01-22T21:41:41.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2018-04-19T20:48:33.000Z (about 8 years ago)
Last Synced: 2025-03-25T10:12:05.606Z (over 1 year ago)
Language: Groovy
Homepage:
Size: 97.7 KB
Stars: 14
Watchers: 2
Forks: 7
Open Issues: 4
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project

README

          = Cassalog

Cassalog is a schema change management library and tool for

http://cassandra.apache.org[Apache Cassandra] that can be used with

applications running on the JVM.

== Why?

Just as application code evolves and changes so do database schemas. If you are

building an application and intend to support upgrading from one version to

another, then managing schema changes is essential. If you are lucky, you might

be able to get by with running some simple upgrade scripts to bring the schema

up to date with the new version. This likely will not work however if you

support multiple upgrade paths. For example, suppose we have versions 1 and 2,

and are introducing version 3 of an application. We want to allow upgrading to

version 3 from either 1 or 2 in addition to upgrading from 1 to 2.

You could add schema upgrade logic to application code, but that is often a

less that ideal solution as it convolutes the code base. Fortunately, there are

tool for managing schema changes like http://www.liquibase.org/[Liquibase],

http://flywaydb.org/[Flyway], and

http://guides.rubyonrails.org/active_record_basics.html[Active Record] for Ruby

on Rails applications. These tools however, are designed specifically for

relational databases. I previously spent time trying to patch Liquibase to

support Cassandra but found that it was not a good fit. Cassalog is designed

solely for use with Cassandra, not for any other database systems.

Cassalog is written in Groovy. There are several reasons for this. First,

Groovy offers great interoperability with Java, making it usable and accessible

to application running on the JVM. Groovy's dynamic and meta programming

features make it easy to write domain specific languages. Groovy has multi-line

strings and string interpolation out of the box, both of which can be really

useful for writing schema change scripts. Lastly, with Cassalog schema changes

are not written in XML or JSON. Instead they are written as Groovy scripts

giving you the full power and flexibility of Groovy.

== Usage

The Cassalog class is the primary class with which you will interact.

[source,groovy]

----

// Groovy

def script = // load schema change script

def session = // obtain DataStax driver Session object

def cassalog = new Cassalog(session: session)

cassalog.execute(script)

----

[source,java]

----

// Java

URI script = // load schema change script

Session session = // obtain DataStax driver Session object

Cassalog cassalog = new Cassalog();

cassalog.setSession(session);

cassalog.execute(script);

----

And here is what a cassalog script might look like,

[source,groovy]

----

createKeyspace {

  version '0.1'

  name 'my_keyspace'

  author 'admin'

  description 'Set up a keyspace for unit tests'

}

schemaChange {

  version '0.1.1'

  author 'admin'

  description 'Create table for storing time series data'

  cql """

CREATE TABLE metrics (

    id uuid,

    time timeuuid,

    value double,

    PRIMARY KEY (id, time)

)

"""

}

----

TIP: Schema changes are applied in the order that they are declared in the

script(s) regardless of the assigned versions.

== Features

* Tagging

* Execute arbitrary Groovy / Java code in schema change scripts

* Pass variables to scripts

* Changes can stored across multiple scripts

* Schema change detection

=== Tagging

You can specify tags when running Cassalog, e.g.

[source,groovy]

----

// Groovy

def script = // load schema change script

def session = // obtain DataStax driver Session object

def cassalog = new Cassalog(session: session)

cassalog.execute(script, ['dev', 'test_data'])

----

[source,java]

----

// Java

URI script = // load schema change script

Session session = // obtain DataStax driver Session object

Cassalog cassalog = new Cassalog();

cassalog.setSession(session);

cassalog.execute(script, Collections.asList("dev", "test_data"));

----

Cassalog will apply schema changes that have not already been run and that

* Dot not specify any tags or

* Specify tags and include the `dev` and `test_data` tags

=== Execute arbitrary code

Cassandra is frequently used for time series data. Suppose we have a metrics

table, and we want to generate some sample data for tests.

[source,groovy]

----

schemaChange {

  version '1.0'

  cql """

CREATE TABLE metrics (

    id text PRIMARY KEY,

    time timestamp,

    value double

)

"""

}

testData = []

random = new Random

10.times { i ->

  testData << "INSERT INTO metrics (id, time, value) VALUES ('$i', ${new Date().time + 100}, ${random.nextDouble()})"

}

schemaChange {

  version '1.0.1'

  tags 'test_data'

  cql testData

}

----

This script first calls the `schemaChange` function to create the metrics table.

The next few lines generate a list of INSERT statements with some test data.

Finally, we have another call to `schemaChange`. It specifies the test_data

tag and passes the `testData` list to the `cql` parameter.

=== Pass variables to scripts

You can pass arbitrary variables to scripts, not just strings.

[source,groovy]

----

// Groovy

def vars = [

  metricIds: ['M1', 'M2', 'M3'],

  startDate: new Date()

  maxValue: 100,

  minValue: 50

]

cassalog.execute(script, vars)

----

[source,java]

----

// Java

Map vars = ImmutableMap.of(

    "metricIds", asList("M1", "M2", "M3"),

    "startDate", new Date(),

    "maxValue", 100,

    "minValue", 50

);

cassalog.execute(script, vars);

----

=== Changes can stored across multiple scripts

You can use the `include` function to store changes in multiple script to

keep your schema changes more modular and better organized.

[source,groovy]

----

include '/dbchanges/base_tables.groovy'

include '/dbchanges/seed_data.groovy'

----

The `include` function currently takes a single string argument that should

specify the absolute path of a script on the classpath or from the configured `baseScriptsPath`.

`baseScriptsPath` is an absolute path to where the other include scripts are located e.g. `/Users/john/cassalog/scripts`.

=== Schema change detection

Cassalog does not store the CQL code associated with each schema change. It

computes a hash of the CQL and stores that instead. If the hash in the change

log differs from the hash of the CQL in the source script, Cassalog will throw

a ChangeSetAlteredException.

You will need to manually resolve the issue that caused the

ChangeSetAlteredException. Cassandra does not support transactions like a

relational database, so there no rollback functionality to fall back on.

== Change Log Table

All schema changes are recorded in the change log table, _cassalog_. The table

will be created the first time Cassalog is run. Change log data looks like,

[noformat]

----

bucket | revision | applied_at               | author | description | hash         | version  | tags

--------+----------+--------------------------+--------+-----------------------------------------------------+

     0 |        0 | 2016-01-28 11:09:54-0500 | admin | First table  | 0xe361957eeb |      1.0 | {'legacy'}

     0 |        1 | 2016-01-28 11:09:54-0500 | admin | Second table | 0xf336e725d4 |      1.1 | {'legacy'}

     0 |        2 | 2016-01-28 11:09:55-0500 | admin | Third table  | 0xcecef5f840 |      1.2 | {'legacy', 'dev'}

     0 |        3 | 2016-01-28 11:09:55-0500 | admin | Fourth table | 0x4b5d24b77c |      1.3 | {'legacy'}

----

Here is a brief overview of the schema.

[noformat]

----

CREATE TABLE cassalog (

    bucket int,

    revision int,

    applied_at timestamp,

    author text,

    description text,

    hash blob,

    version text,

    tags set,

    PRIMARY KEY (bucket, revision)

)

----

*author* +

The username, or email address, etc. of the person making the change. This is

an optional field and can be null.

*description* +

A summary of the changes. This is an optional field and can be null.

*hash* +

Cassalog does not store the CQL statements that it executes. Instead it stores a

hash that uniquely identifies the CQL statement(s). Cassalog generates this

hash value.

*version* +

The version can be an arbitrary string. It should be a unique identifier for the

change; however, Cassalog does not enforce uniqueness. This is a required field.

*tags* +

An optional set of user-supplied tags.

*revision* +

Cassalog assigns a revision number to each change that it applies. It uses the

revision number to keep track of the order in which changes are applied. If the

order of schema changes in a source script is changed, then a

ChangeSetAlteredException will be thrown.

*bucket* +

Cassalog stores multiple rows per physical partition. This is a revision offset.

The bucket size defaults to 100.

== Building from Source

Cassandra is built with Maven and requires a JVM version 1.7 or later. Test

execution requires a running Cassandra cluster (which can be a single node) with

a node listening on 127.0.0.1. Cassandra 2.0 or later should be used.

[source,bash]

----

git clone https://github.com/jsanda/cassalog.git

cd cassalog

mvn install

----

TIP: If you want to build without having a running Cassandra instance, you can

run `mvn install -DskipTests`

== Setting up Cassandra for development or testing

The recommended way to set up Cassandra is by using

https://github.com/pcmanus/ccm[ccm (Cassandra Cluster Manager)].

As Cassalog evolves and looks to support different versions of Cassandra and

CQL, ccm is the likely tool of choice to use for testing against different

versions.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hawkular/cassalog

Awesome Lists containing this project

README